Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
19 views57 pages

Sensors 24 04013

This review paper discusses the advancements in predictive analytics models within the oil and gas (O&G) industry, focusing on machine learning techniques from 2021 to 2023. It highlights the importance of predictive analytics for optimizing operations, reducing costs, and enhancing decision-making, while also addressing challenges such as data management and algorithm effectiveness. The paper serves as a comprehensive guide for future research directions and the application of AI in improving predictive analytics in the O&G sector.

Uploaded by

tomirisramazan7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views57 pages

Sensors 24 04013

This review paper discusses the advancements in predictive analytics models within the oil and gas (O&G) industry, focusing on machine learning techniques from 2021 to 2023. It highlights the importance of predictive analytics for optimizing operations, reducing costs, and enhancing decision-making, while also addressing challenges such as data management and algorithm effectiveness. The paper serves as a comprehensive guide for future research directions and the application of AI in improving predictive analytics in the O&G sector.

Uploaded by

tomirisramazan7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 57

sensors

Review
A Review of Predictive Analytics Models in the Oil and
Gas Industries
Putri Azmira R Azmi 1 , Marina Yusoff 1,2,3, * and Mohamad Taufik Mohd Sallehud-din 4

1 College of Computing, Informatics and Mathematics, Universiti Teknologi MARA (UiTM),


Shah Alam 40450, Selangor, Malaysia
2 Institute for Big Data Analytics and Artificial Intelligence (IBDAAI), Universiti Teknologi MARA (UiTM),
Shah Alam 40450, Selangor, Malaysia
3 Faculty of Business, Sohar University, Sohar 311, Oman
4 PETRONAS Research Sdn Bhd, Petronas Research & Scientitic, Jln Ayer Hitam, Bangi Government and
Private Training Centre Area, Bandar Baru Bangi 43000, Selangor, Malaysia; [email protected]
* Correspondence: [email protected] or [email protected]

Abstract: Enhancing the management and monitoring of oil and gas processes demands the develop-
ment of precise predictive analytic techniques. Over the past two years, oil and its prediction have
advanced significantly using conventional and modern machine learning techniques. Several review
articles detail the developments in predictive maintenance and the technical and non-technical aspects
of influencing the uptake of big data. The absence of references for machine learning techniques
impacts the effective optimization of predictive analytics in the oil and gas sectors. This review paper
offers readers thorough information on the latest machine learning methods utilized in this industry’s
predictive analytical modeling. This review covers different forms of machine learning techniques
used in predictive analytical modeling from 2021 to 2023 (91 articles). It provides an overview of the
details of the papers that were reviewed, describing the model’s categories, the data’s temporality,
field, and name, the dataset’s type, predictive analytics (classification, clustering, or prediction), the
models’ input and output parameters, the performance metrics, the optimal model, and the model’s
benefits and drawbacks. In addition, suggestions for future research directions to provide insights
into the potential applications of the associated knowledge. This review can serve as a guide to
enhance the effectiveness of predictive analytics models in the oil and gas industries.

Citation: R Azmi, P.A.; Yusoff, M.;


Keywords: classification; clustering; machine learning; oil and gas; predictive analytics
Mohd Sallehud-din, M.T. A Review of
Predictive Analytics Models in the Oil
and Gas Industries. Sensors 2024, 24,
4013. https://doi.org/10.3390/ 1. Introduction
s24124013 As stated in the International Energy Agency’s 2020 report, the oil and gas (O&G)
Academic Editor: Andrea Cataldo sector plays an important role in the global economy and substantially contributes to
fulfilling the world’s energy needs. The efficient management and optimization of opera-
Received: 6 May 2024 tions within this sector are important for ensuring a dependable energy supply, mitigating
Revised: 28 May 2024 environmental impacts, and maximizing economic returns [1,2]. Predictive analytics uses
Accepted: 11 June 2024
statistical modeling, data mining, and ML to predict outcomes based on past data [3,4]. This
Published: 20 June 2024
approach has gained popularity and facilitates decision-making by considering qualitative
and quantitative data. The practice involves evaluating several factors to determine the
relevance of predictions, as highlighted by Sharma and Villányi [5]. Various well-known
Copyright: © 2024 by the authors.
predictive analytics models, such as classification, clustering [6], and prediction models, are
Licensee MDPI, Basel, Switzerland. utilized in this context [7]. Predictive analytics is crucial in real-world scenarios within the
This article is an open access article O&G industry. Examples include its application in optimizing drilling operations, which is
distributed under the terms and employed to adapt to the detection and identification of drill pipe stuck-up events [8]. In
conditions of the Creative Commons pipeline risk assessment, predictive analytics also validates the effectiveness of algorithms
Attribution (CC BY) license (https:// for calculating the need for strain in a pipe [9]. Furthermore, predictive analytics is em-
creativecommons.org/licenses/by/ ployed in exploration and production to detect and classify events to minimize downtime,
4.0/). reduce maintenance costs, and prevent damage to installations in oil wells [10].

Sensors 2024, 24, 4013. https://doi.org/10.3390/s24124013 https://www.mdpi.com/journal/sensors


Sensors 2024, 24, x FOR PEER REVIEW 2 of 60

Sensors 2024, 24, 4013 2 of 57


effectiveness of algorithms for calculating the need for strain in a pipe [9]. Furthermore,
predictive analytics is employed in exploration and production to detect and classify
events to minimize downtime, reduce maintenance costs, and prevent damage to instal-
lations in oil wellsPredictive
[10]. analytics in the O&G field can be better understood by in-depth knowledge
of its
Predictive past, present,
analytics in the O&G and field
futurecansituations. This includes
be better understood pipelines, wells, and gas and
by in-depth
knowledge of oilitsmodels.
past, present,
Severaland future situations.
articles describe This
theincludes pipelines, wells,
advancements and
in predictive maintenance and
gas and oil models. Several articles
the technical describe the advancements
and non-technical in predictive
factors affecting maintenance
significant data implementation. The
and the technical and non-technical factors affecting significant data implementation. The
review article recommended further research on integrating AI with other state-of-the-art
review article recommended further research on integrating AI with other state-of-the-art
technologies. AI has the potential to revolutionize maintenance techniques, and its ongoing
technologies. AI has the potential to revolutionize maintenance techniques, and its ongo-
development
ing development will indeed will indeedhow
influence influence
the O&G how thedevelops
sector O&G sector
in the develops
future [11].in the future [11]. This is
because
This is because there
there are stillare still
issues issues
with with AI
AI methods methods
and and
tools, such as tools, suchcoin-
overfitting, as overfitting, coincidence
effects,
cidence effects, and overtraining
and overtraining [12]. [12].
Furthermore,Furthermore,
many studies have many been conducted
studies haveusing
been various simulation
conducted method-
using various simulation method-
ologies for quantitative and qualitative predictive analytics in the
ologies for quantitative and qualitative predictive analytics O&G field in terms of O&G field in terms of
in the
classification, clustering, and prediction. In the last two years, ML models have been ex-
classification, clustering, and prediction. In the last two years, ML models have been
tensively applied to O&G predictive analytics to address the shortcomings of traditional
extensively
numerical models. Figureapplied
1 presents toaO&G predictive
pie chart analyticsoftothe
of the distribution address the ana-
predictive shortcomings of traditional
lytics model.numerical models. Figure 1 presents a pie chart of the distribution of the predictive analytics
model.
The Distribution Of Predictive Analytics In O&G
Field

Prediction
34%

Classification
53%

Clustering
13%

Figure 1. Distribution of the predictive analytics model in the O&G field.


Figure 1. Distribution of the predictive analytics model in the O&G field.

Figurethe
Figure 1 illustrates 1 illustrates
three categoriesthe of
three categories
predictive of predictive
analytics applied in the analytics
study applied in the study
using ML and using ML and AAI
AI techniques. techniques.
little A little over
over 13% of clustering studies13%
haveof clustering
employed mod-studies have employed
modeling
eling methods. Many ofmethods.
these do not Many of these
require do not
clustering require
studies clustering
because there isstudies
enough because there is enough
supervised labeling
superviseddata, labeling
which leads to 53%
data, of researchers
which leads to 53%favoring classification.favoring classification.
of researchers
Recently, modern artificial
Recently, intelligence
modern models,
artificial such as ANN,
intelligence Deep Learning
models, such as (DL),
ANN, Deep Learning (DL),
Fuzzy Logic, Decision Tree (DT), RF, and hybrid models have been implemented to model
Fuzzy Logic, Decision Tree (DT), RF, and hybrid models have been implemented to model
the O&G domain, such as a review of 91 publications and a bibliography on the use of AI
the O&G domain, such as a review of 91 publications and a bibliography on the use of
in the O&G field. Figure 2 shows that, in recent decades, this field of research has in-
AI in theadditional
creased. Nevertheless, O&G field. Figure
studies 2 shows
on predictive that, in
analytics recent
models anddecades, this field of research has
datasets are
required to increased.
identify the Nevertheless,
suitability of theadditional
model and studies
dataset foronincorporating
predictive analytics
diverse models and datasets
mathematical are
and required
statisticalto identify
elements the suitability
alongside of the
heuristic and modelmethods.
arithmetic and dataset for incorporating diverse
The use
of AI has been widely utilized
mathematical andinstatistical
various fields, such asalongside
elements science [13–15], energy
heuristic and[16–18],
arithmetic methods. The use
and economicsof AI[19–21]. Some widely
has been examplesutilized
include in
MLvarious
techniques [22–24],
fields, such ensemble tech-[13–15], energy [16–18],
as science
niques [25,26], soft computing techniques [27,28], statistical techniques [29], and fuzzy-
and economics [19–21]. Some examples include ML techniques [22–24], ensemble tech-
based systems [30]. The effective application of AI in several O&G domains, such as gas
niques
[31], pipeline [25,26],
[32], crude softoxyhydrogen
oil [33], computinggas techniques
retrofit [34],[27,28], statistical
and transformer oil techniques
[35], [29], and fuzzy-
based systems [30]. The effective
has received increased interest in the last few years. application of AI in several O&G domains, such as
gas [31], pipeline [32], crude oil [33], oxyhydrogen gas retrofit [34], and transformer oil [35],
has received increased interest in the last few years.
Predicting the performance and production of O&G has consistently presented a
challenge. The imperative to create resilient prediction methods is driven by the desire for
enhanced financial viability and superior technical outcomes [36]. As a critical sector, the
O&G industry faces complex challenges, ranging from volatile market conditions to oper-
ational uncertainties and safety concerns. Its transformative potential is to revolutionize
operations, enhance efficiency, and mitigate risks.
Predictive analytics offers a powerful toolset to address these challenges and unlock
numerous benefits. For instance, proactive decision-making by O&G engineers is made pos-
Sensors 2024, 24, 4013 3 of 57

sible by operational efficiency from real-time data analysis. This helps organizations spot
problems before they escalate, optimize resource utilization, and streamline processes. In
addition, cost reduction can help O&G companies be cost-effective by optimizing resource
allocation, reducing waste, and enhancing overall resource efficiency through insights
from predictive analytics. Numerous studies have explored and documented AI’s effec-
Sensors 2024, 24, x FOR PEER REVIEWtiveness in modeling O&G over the last three years. Many initial 3efforts
of 60 comprised basic
and conventional AI techniques, including perceptron-based Artificial Neural Networks
(ANNs) [37–39].
Total Publications By Year
Total Number Of Published Papers = 91

40
34
35 32
30
25
25
20
15
10
5
0
2021 2022 2023
Year

Figure 2. Total of predictive analytics models in the O&G field by year.


Figure 2. Total of predictive analytics models in the O&G field by year.

The subsequent sections provide thorough descriptions and in-depth analyses of


Predicting the performance and production of O&G has consistently presented a
the utilization
challenge. of ML
The imperative models
to create forprediction
resilient O&G prediction.
methods is drivenGiven the
by the detailed
desire for exploration in
these sections,
enhanced providing
financial viability additional
and superior information
technical outcomes [36]. onAsthis topicsector,
a critical in the theform of a literature
O&G industry
review wouldfacesbecomplex challenges,
redundant and ranging from volatile
unnecessary. marketsome
While conditions to op-
comprehensive analyses of
erational uncertainties and safety concerns. Its transformative potential is to revolutionize
O&G modeling utilizing ML models have been conducted, like the most current research
operations, enhance efficiency, and mitigate risks.
conducted by Tahaoffers
Predictive analytics anda powerful
Mansour [40],to it
toolset has been
address suggested
these challenges and that
unlock optimized machine
learningbenefits.
numerous techniques and data
For instance, transformation
proactive decision-makingmethods can increase
by O&G engineers is madethe precision of the
possible
faulty by operational
power efficiency prediction
transformer from real-time data
for analysis. This
Dissolved Gashelps organizations
Analysis (DGA) in the O&G field.
spot problems before they escalate, optimize resource utilization, and streamline pro-
Additionally, the aim of this paper is to discuss the most recent advancements, progress,
cesses. In addition, cost reduction can help O&G companies be cost-effective by optimiz-
constraints,
ing and difficulties
resource allocation, reducing waste,related to complex
and enhancing overallAIresource
techniques forthrough
efficiency O&G data management.
Because
insights fromofpredictive
this, researchers, petroleum
analytics. Numerous engineers,
studies have exploredandandenvironmentalists
documented AI’s attracted by the
possible uses
effectiveness of AI within
in modeling O&G over thetheoil and
last gas
three industry
years. represent
Many initial the target audience for this
efforts comprised
basic and conventional AI techniques, including perceptron-based Artificial Neural Net-
article.
works (ANNs) [37–39].
The subsequent sections provide thorough descriptions and in-depth analyses of the
2. Predicted Analytics Models for O&G
utilization of ML models for O&G prediction. Given the detailed exploration in these sec-
2.1. Application of Artificial
tions, providing additional NeuralonNetwork
information this topicModels
in the form of a literature review
would be This model is a computationalsome
redundant and unnecessary. While comprehensive
framework that analyses
imitatesofhowO&G mod-data are processed and
eling utilizing ML models have been conducted, like the most current research conducted
analyzed in the cognitive structure of humans [41]. Neural
by Taha and Mansour [40], it has been suggested that optimized machine learning tech-
networks accumulate their
understanding by identifying patterns and relationships
niques and data transformation methods can increase the precision of the faulty powerin data through experiential learn-
ing [42]. prediction
transformer The ANN’s architecture
for Dissolved consists
Gas Analysis (DGA) ofinthree
the O&G essential elements, including input,
field. Additionally,
the aim of this
process, andpaper is to and
output, discuss
itsthe most recent advancements,
functionality is predominantly progress, constraints, by the interconnec-
determined
and difficulties related to complex AI techniques for O&G data management. Because of
tions between these elements and the role of connections in natural processing [43]. An
this, researchers, petroleum engineers, and environmentalists attracted by the possible
uses of AIaims
ANN withintothe
convert
oil and gasinputs into
industry meaningful
represent the targetoutputs
audience[44]. Before
for this article. being transmitted to
the output layer, data are initially introduced into the layer of input, which processes it
before forwarding it to the hidden layer. Each layer is made up of neurons that resemble
computational units. These neurons use activation functions like sigmoid, linear, tanh, and
o analyze each data record. Several optimizers are available to improve neural network
performance by iteratively adjusting network weights based on training data [44,45].
The research has extensively explored the versatile application of ANN models for
predicting O&G properties across diverse domains. Qin et al. [46] thoroughly explored non-
Sensors 2024, 24, 4013 4 of 57

temporal data from a buried gas pipeline, employing various algorithms with a combination
of ANN and metaheuristics models such as the Quantum Particle Swarm Optimization-
Artificial Neural Network, Weighted Quantum Particle Swarm Optimization-Artificial
Neural Network (QPSO-ANN), and Levy Flight Quantum Particle Swarm Optimization-
Artificial Neural Network (LWQPSO-ANN). The study focused on predicting crater width,
with important parameters for the prediction of buried pipelines, such as pipe diameter
(mm), operating pressure (MPa), cover depth (m), and crater width (m). In this work,
LWQPSO-ANN outperformed other methods by more than 95%.
Meanwhile, in another study on non-temporal pipeline conditions, a range of ML
algorithms, including ANN, Support Vector Machine (SVM), Ensemble Learning (EL), and
Support Vector Regression (SVR), were used [47]. Their investigation included elements
impacting corrosion defect depth, such as CO2 levels, temperature, pH, liquid velocity,
pressure, stress, glycol concentration, H2 S levels, organic acid content, oil type, water
chemistry, and hydraulic diameter. The emphasis on the ANN was evident, indicating that
it is a skilled navigator of the complex network of variables affecting pipeline corrosion.
In the complicated landscape of well-data analysis, Sami and Ibrahim [48] utilized non-
temporal datasets from Middle East fields, concentrating on vertical wells. Random Forest
(RF), k-nearest Neighbors (KNN), and ANNs were used to predict the bottom-hole pressure
flowing (Pwf) through vertical petroleum wells. The preference for the ANN spotlighted its
efficacy in modeling intricate relationships within well data, as underscored by evaluation
metrics such as the Mean Squared Error (MSE) and Coefficient of Determination (R2 )
The proposed method that used R2 values for training and testing were 97% and 93%
respectively, significantly higher than the models implemented in the study.
Moreover, Qayyum Chohan et al. [49] constructed non-temporal datasets using ML
algorithms like the ANN, Least Square Boosting (LSB), and Bagging for the prediction
of oil using 2600 samples from oil shales. The input parameters that were used in the
study are air molar flowrate, illite silica, carbon, hydrogen content, feed preheater temp,
and air preheater temp. Through a coefficient of correlation of 99.6% for oil yield and
99.9% for carbon dioxide, the Root Mean Squared Error (RMSE) evaluation metric was
highlighted, emphasizing the applicability of ANNs in interpreting the complex factors
influencing oil yield and carbon dioxide emissions in complex processes. The suggested
model outperformed other models in terms of accuracy. A set of ML methods, including
NB+KNN, DT, RF, SVM, and ANN, were applied to 769 temporal data samples related to
ocean slick signs in the surrounding area of the exploration site [50]. The study’s emphasis
on ANNs amidst this array of algorithms underscored its pivotal role in discerning Sea-
Surface Petroleum Signatures. Although the specific parameters of the ocean slick signature
were not explicitly stated, the study spotlighted the ANN’s prowess in unraveling patterns
related to oil detection in dynamic ocean conditions with an accuracy of 90%. However, the
proposed model did not give significant results for classifying ocean slick signatures.
Several machine learning models were used in the study, including Partial Least
Squares (PLS), Deep Neural Network (DNN), Feature Projection Model (FPM), Feature
Projection-Deep Neural Network (FP-DNN), and Feature Projection-PLS (FP-PLS) [51]. The
study looked at long-distance pipelines without considering time. The dataset consisted
of 2093 samples, and the prediction task included characteristics such as the original total
oil length, inner dimensions, pipeline length, Reynolds quantity, comparable length, and
actual combined oil length. The assessment parameter employed was RMSE, and the DNN
model displayed an RMSE of 146%. The research showed that the error rate was the highest
and least convincing one, indicating that the model’s prediction accuracy must be increased.
Utilizing the ASPEN HYSYS V11 process simulator, Mendoza et al. [52] used non-temporal
analysis in crude oil processes. The study used the ANN and Genetic Algorithm (GA)
to predict critical variables such as feed flow rate, gas product pressure, interstage gas
discharge pressure, and centrifugal compressor isentropic efficiency, aiming to increase oil
production. The ANN+GA model improved the performance of the predicted variable.
Sensors 2024, 24, 4013 5 of 57

Shifting the focus to gas-phase pollutants, Sakhaei et al. [53] performed non-temporal
research using proprietary data. The study used ANNs to estimate methanol, α-pinene,
and hydrogen sulfide concentrations for gas-phase contamination removal in OLP-BTF and
TLP-BTF. The ANN+PSO model, which used 104 samples, achieved a desired performance
measurement using R2 of more than 99% indicating its effectiveness. The authors were
prompted to contemplate possible improvements for practical implementations when the
suggested model showed encouraging outcomes. ANN, Least Square Support Vector
Machine (LSSVM), and Multi-Gene Genetic Programming (MGGP) were utilized in reser-
voir engineering to analyze temporal data for gas-aided gravity drainage (GAGD) [54].
Compared to the suggested strategy, with various input parameters and 223 samples, the
ANN’s model showed 976% of R2 and 0.0520 of RMSE. In contrast, MGGP returned 89%
(R2 ) and 0.0846 (RMSE). The study demonstrated the superiority of the ANN technique in
reservoir prediction tasks.
Mao et al. (2022) investigated DGA datasets by combining Multivariate Time Series
clustering approaches and graph neural networks (GNNs), moving on to transformer fault
diagnosis in the temporal domain. The study concentrated on clustering H2 , CH4 , C2 H6 ,
C2 H4 , C2 H2 , CO, and CO2 using 1408 samples to diagnose power transformer defects.
The MTGNN model attained an impressive 92% accuracy, demonstrating its efficacy in
the spatiotemporal area of power transformer problem detection. In the context of non-
temporal analysis within the field of crude oil, Wang et al. [33] studied contemporary
research, employing an ANN and a hybrid Multilayer Perceptron with Backpropagation
for prediction. The model used 172 samples and a variety of characteristics to estimate
diffusion coefficients, including temperature, pressure, liquid viscosity, gas viscosity, liquid
molar volume, gas molar volume, liquid molecular weight, gas molecular weight, and
interfacial tension. Although the training and testing R2 s were 88% and 89%, respectively,
the proposed Multilayer Perceptron with Backpropagation model had less accuracy, and
the hybrid technique did not deliver the expected improvement.
The study from Zhang et al. [55] experimented with the temporal crude oil and
transportation system data using the GA with a backpropagation neural network for
prediction. The model produced outstanding results with 509 samples, including numerous
factors linked to the system’s temperature, pressure, and consumption, achieving 99%
accuracy for energy and heat and 97% for power. The GA with a backpropagation neural
network was highly influential in predicting the complicated dynamics of the crude oil
system. In cooperation with the Egyptian General Petroleum Corporation (EGPC), Ismail
et al. [56] conducted a temporal study of drilling activities. The model used Multilayer
Perceptron (MLP) and the ANN for grouping and classification tasks based on epochs, age,
formation, lithology, and fields for predicting gas routes and chimneys. Surprisingly, the
MLP model achieved an RMSE of 0.10, indicating decreased error rates and surpassing
other approaches for predicting drilling-related occurrences.
The Extreme Learning Machine (ELM), Elastic Net Linear, Linear Support Vector
Regression (Linear-SVR), Multivariate Adaptive Regression Spline, Artificial Bee Colony,
Particle Swarm Optimization (PSO), Differential Evolution, Simple Genetic Algorithm, Grey
Wolf Optimizer (GWO), and Exponential Natural Evolution Strategies (xNES) are some
of the models that Goliatt et al. [57] used in the temporal domain of shale gas exploration
within the YuDong-Nan shale gas field. To estimate total organic carbon, the DE+ELM
hybrid model produced an acceptable RMSE of 0.497 when predicting factors such as
clay, K-feldspar, pyrite, and other elements. Nevertheless, GWO did not outperform the
other approaches. In the temporal field of reservoir engineering, specifically within the
North Sea’s “Gullfaks”. An MLP-LMA model was suggested by Amar et al. [58] to produce
predictions for half-cycle time, shutdown, water alternating gas injection, and the amount of
gas and water injected. The proposed approach outperformed the other two proxy models,
achieving higher accuracy and much shorter simulation times. Table 1 lists research articles
on predictive analytics in the O&G field using ANN models.
Sensors 2024, 24, 4013 6 of 57

Table 1. A list of research articles on predictive analytics in the O&G field using ANN models.

Output Performance Advantages/


Reference Models Temporality Field Dataset Class Input Parameter Best Model
Parameter Metrics Disadvantages
The proposed
Pipe diameter (mm), method
SVM, QPSO-ANN, Buried gas Map, R2 ,
MSE.
operating pressure (MPa), outperformed
[46] WQPSO-ANN, and Non-temporal Pipeline pipeline Prediction Crater width RMSE, MAPE, LWQPSO-ANN
cover depth (m), and crater the other
LWQPSO-ANN 99 samples and MAE
width (m) method by more
than 95%.
Oil gravity (API), well
perforation depth (depth
(ft), surface temperature
(ST (F)), well bottom-hole
Vertical oil wells’ The suggested
Middle East temperature (BT (F)), ANN
flowing model had a
fields: for flowing gas rate (Qg R2 = 97%
[48] RF, KNN, and ANN Non-temporal Wells Prediction bottom-hole MSE and R2 much greater
vertical wells (Mscf/day)), flowing (training) and
pressure Pwf value than the
206 samples water rate (Qw (bbl/day)), 93% (testing)
(psia) other models.
production tubing internal
diameter (ID (inches)), and
wellhead pressure (Pwh
(psia)).
The suggested
Air molar flowrate, illite ANN model’s
silica, carbon, hydrogen Petroleum RMSE oil yield precision
ANN, LSB, and Oil shale
[49] Non-temporal Oil Prediction content, feed preheater output with RMSE = 99.6% outperformed
Bagging 2600 samples
temp, and air preheater CO2 emissions RMSE CO = the performance
temp 99.9% of the remaining
models.
Accuracy,
The proposed
Ocean slick Sea-surface sensitivity,
NB, KNN, DT, RF, ANN model did not
[50] Temporal Oil signature Classification The data are confidential. petroleum specificity, and
SVM, and ANN Accuracy = 90% give significant
769 samples signatures predictive
results.
values
CO2 , temperature, pH,
liquid velocity, pressure,
The proposed
ANN, SVM, EL, and The data are stress, glycol concentration, Corrosion defect EL, ANN, and
[47] Non-temporal Pipeline Classification MSE and R2 methods had a
SVR confidential. H2 S, organic acid, oil type, depth SVR
low error rate.
water chemistry, and
hydraulic diameter
Sensors 2024, 24, 4013 7 of 57

Table 1. Cont.

Output Performance Advantages/


Reference Models Temporality Field Dataset Class Input Parameter Best Model
Parameter Metrics Disadvantages
Mixed oil length, inner
The error rate is
Long-distance diameter, pipeline width,
PLS, DNN, FPM, DNN not convincing
[51] Non-temporal Pipeline pipelines Prediction Reynolds number, Mixed oil length RMSE
FP-DNN, and FP-PLS RMSE = 146% and is the
2093 samples equivalent length, and
highest one.
actual mixed oil length.
The
Well, feed flow rate,
performance of
the pressure of gas
ASPEN HYSYS Enhance ANN+GA to
products, interstage gas
[52] ANN and GA Non-temporal Crude Oil V11 process Prediction petroleum R2 ANN enhance
discharge pressure,
simulator production petroleum
isentropic efficiency of
production is
centrifugal compressor
improved.
The proposed
model is good,
The removal of
and the author
The data are gas-phase M, P,
Sulfur dioxide, methanol, ANN+PSO suggested
[53] ANN Non-temporal Gas confidential. Prediction and H in an R2 and MSE
and α-pinene R2 > 99% improving the
104 samples OLP-BTF and a
model with
TLP-BTF.
real-world
applications.
Height, dip angle, wetting
phase viscosity,
non-wetting phase
The ANN
Previous viscosity, wetting phase
outperformed
experimental density, non-wetting phase Gas-assisted ANN
ANN, LSSVM, and R2 , RMSE, MSE, the proposed
[54] Temporal Reservoir and simulation Prediction density, matrix porosity, gravity drainage R2 = 97%
MGGP ARE, and AARE method (MGGP
studies fracture porosity, matrix (GAGD) RMSE = 0.0520
= 89% (R2 ) and
223 samples permeability, fracture
0.0846 (RMSE)).
permeability, injection rate,
production time, and
recovery factor
The model was
Power
GNN and Multivariate DGA H2 , CH4 , C2 H6 , C2 H4 , MTGNN proven to be
[59] Temporal Transformer Clustering transformer Accuracy
Time Series 1408 samples C2 H2 , CO, and CO2 Accuracy = 92% effective in its
fault diagnosis
application.
Sensors 2024, 24, 4013 8 of 57

Table 1. Cont.

Output Performance Advantages/


Reference Models Temporality Field Dataset Class Input Parameter Best Model
Parameter Metrics Disadvantages
Pressure (P) [Kpa],
temperature (T) [C], liquid
Multilayer
viscosity (uL) [c.p.], gas The suggested
Perceptron with
viscosity (uG) [c.p.], liquid model had low
Backpropaga-
molar volume (VL) accuracy.
ANN and Multilayer Diffusion tion
Recent literature [m3 /kmol], gas molar The hybrid
[33] Perceptron with Non-temporal Crude Oil Prediction coefficient (D) MSE and RMSE R2 :
172 samples volume (VG) [m3 /kmol], model did not
Backpropagation [m2 /s] Training dataset
liquid molecular weight improve the
= 88%
(MWL) [kg/kmol], gas model’s
Testing dataset =
molecular weight (MWG) accuracy.
89%
[kg/kmol], and interfacial
tension (o) [Dyne]
The inlet temperature of
the combined system,
outlet temperature of the
combined system, inlet
pressure of the combined
system, outlet pressure of
Crude oil
the combined system, inlet The model
GA with a gathering and Energy = 99% GA with a back-
and outlet temperature of provided
[55] backpropagation Temporal Crude oil transportation Prediction Heat = 99% R2 propagation
the transfer station system, considerable
neural network system Power = 97% neural network
inlet and outlet pressure of results.
509 samples
the transfer station system,
inlet and outlet of the oil
gathering wellhead system,
treatment liquid volume,
total power consumption,
and total gas consumption
The proposed
Egyptian
model had a
General
Clustering Gas channels lower error rate
Petroleum Epoch, age, formation, MLP
[56] MLP and ANN Temporal Drilling and classi- and chimney RMSPE and
Corporation lithology, and fields RMSE = 0.10
fication prediction outperformed
(EGPC)
the other
1045 samples
method.
Sensors 2024, 24, 4013 9 of 57

Table 1. Cont.

Output Performance Advantages/


Reference Models Temporality Field Dataset Class Input Parameter Best Model
Parameter Metrics Disadvantages
ELM, Elastic Net
Linear, Linear-SVR, Acceptable
Multivariate Adaptive results for
The minerals were quartz,
Regression Spline, R2 , RMSE, MAE, hybrid ELM
YuDong-Nan calcite, dolomite, barite, Total organic DE+ELM =
[57] Artificial Bee Colony, Temporal Shale gas Prediction MAPE, MARE, models with the
shale gas field pyrite, siderite, clay, and carbon 0.497 (RMSE)
PSO, Differential and WI proposed
K-feldspar.
Evolution, Simple method, except
Genetic Algorithm, for GWO
GWO, and xNES
The proposed
model
Average outperformed
MLP and Radial Basis Injection rate for water,
Gullfaks in the Water absolute relative the other two
[58] Function Neural Temporal Reservoir Prediction gas, and half-cycle time. MLP-LMA
North Sea alternating gas deviation proxy models
Network Downtime.
(AARD) and significantly
reduced the
simulation time.
a neural network [43]. The most commonly used Deep Learning algorithms in gas pipeline
research are the Conventional Neural Network (CNN) and LSTM [61]. Figure 3 shows the
internal structure of LSTM model. The LSTM modelʹs ability to keep essential data for a
longer period is one of its main benefits. Then, it can be applied to a wide range of tasks
Sensors 2024, 24, 4013
that require long-term memory. However, there are several constraints to consider 10 of 57
while
using the LSTM model. Itʹs important to realize that increasing the number of factors
makes training more challenging [62].
2.2. Application
Figure 3 showsof Deep theLearning
processes Models
of the input series in both backward and forward direc-
tions.The DL framework
Bi-LSTM models can appears
learn to beatthe
from several
entirecomplex
sequence models
context based on DL and
by collecting ML
infor-
regarding
mation about the each
prediction
sequence accuracy
element [60]. It isthe
from more frequently
past and future.utilized
They in arealgorithms
highly suited for the
for
life prediction
temporal data of
andO&G equipment
producing [61].predictions
precise A layer of input,
of ionshidden
in the layers,
sequence and[62].
an output layer
contribute
There toarea DL
twomodel.transferThe parameters
states in the LSTMare assigned
model afrom
value in the3:output
Figure a hidden layerstate
using(ℎta)
neural network [43]. The most commonly used Deep Learning algorithms
and a cell state (c ) [62]. The passed c changes quite slowly. The output c is passed from
t t t in gas pipeline
research
ct−1 in the are the Conventional
previous state, with some Neural addedNetwork
values (CNN) and LSTM
[62]. However, there [61]. Figure 3 shows
are typically signif-
the
icant variances in ℎ among nodes. The LSTM model used the current inputessential
internal structure t of LSTM model. The LSTM model’s ability to keep of xt anddata
ℎt−1
for
from a longer periodstate
the previous is oneto of its main
generate benefits.
four states. Then, it can bezapplied
Furthermore, f, zi, and to a wide
zo are range of
accessible to
tasks that require long-term memory. However, there are several
a gating-control state with values between 0 and 1, derived by multiplying the splicing constraints to consider
while
vectorusing
by thethe LSTMmatrix
weight model.andIt’s converting
important toit realize that increasing
by a sigmoid activation thefunction.
number of Thefactors
tanh
makes training more challenging [62].
activation function converts z to a value between −1 and 1 [62].

Figure 3. Internal Structure of LSTM [62].

Figure 3 shows
This interest the processes
in Deep Learning ofisthe input series
exemplified byina both
seriesbackward and forward
of significant direc-
studies show-
tions. Bi-LSTM models can learn from the entire sequence context by collecting
casing its applications. The success of MLSTM in this context was evident through robust information
about each metrics
evaluation sequence element
such from
as MAE andtheRMSE.
past and future.on
Building They
this,are highly suited
Werneck for extended
et al. [63] temporal
data and producing precise predictions of ions in the sequence [62].
the 301 samples of temporal analysis to oil wells from the Metro Interstate Traffic Volume,
There are two transfer statesand
in the LSTM model from Figure 3:utilizing
a hiddenLSTM, t ) and
state (hGated
Appliances Energy Prediction, UNISIM-II-M-CO datasets,
a cell state (c ) [62]. The passed c changes quite slowly. The output c is passed from ct − 1
t t t

in the previous state, with some added values [62]. However, there are typically significant
variances in ht among nodes. The LSTM model used the current input of xt and ht − 1
from the previous state to generate four states. Furthermore, zf , zi , and zo are accessible to
a gating-control state with values between 0 and 1, derived by multiplying the splicing
vector by the weight matrix and converting it by a sigmoid activation function. The tanh
activation function converts z to a value between −1 and 1 [62].
This interest in Deep Learning is exemplified by a series of significant studies show-
casing its applications. The success of MLSTM in this context was evident through robust
evaluation metrics such as MAE and RMSE. Building on this, Werneck et al. [63] extended
the 301 samples of temporal analysis to oil wells from the Metro Interstate Traffic Volume,
Appliances Energy Prediction, and UNISIM-II-M-CO datasets, utilizing LSTM, Gated Re-
current Unit (GRU), and LSTM + Seq2Seq architectures for predicting oil production and
Sensors 2024, 24, 4013 11 of 57

pressure. The parameters used in the study to predict oil production and pressure are
pressure (bottom-hole), water cut, gas–oil ratio, and gas–liquid ratio, which are considered
in the ratios between fluid production (oil, gas, and water). Symmetric Mean Absolute Per-
centage Error (SMAPE), RMSE, and MAE are evaluation measures that demonstrate how
well the models capture the dynamic characteristics of reservoirs. The LSTM + Seq2Seq
and GRU2 architectures are the best models that the researchers have proposed because of
the higher accuracy achieved. Nevertheless, the researchers recommend that future studies
include another metaheuristic method, such as the GA.
In 2022, Wang et al. [61] shifted the focus to the Longmaxi Formation of the Sichuan
Basin with 90,000 data samples for predicting the real-time pipeline crack. The study
proposed the DCNN + LSTM, ANN, LSTM, Recurrent Neural Network (RNN), and SVR
models for natural gas pipelines. The model showcases the impressive performance of
the DCNN + LSTM with an accuracy of 99.37%, emphasizing the significance of LSTM
in predicting shale gas production with robust evaluation metrics in the temporal well
data setting. Antariksa et al. [64] used the West Natuna Basin dataset, which contains
11,497 samples, aligned with input parameters, such as deep and shallow resistivities (LLD
and LLS), sonic (Vp), neutron-porosity (NPHI), density (RHOB), and gamma ray (GR), and
one output parameter, well log data imputation, to apply LSTM and RF models to predict
hydrocarbon production in the gas sector. This demonstrates that LSTM may be applied
to the gas output forecast using metrics like R2 , RMSE, and MSE. The suggested model
provides 94% more accuracy.
Another study explored the classification of non-temporal oil transformers using
the DGA local power utilities and IEC TC10 datasets with 1530 samples. The research
utilized KNN, SVM, and Extreme Gradient Boosting (XGBoost) to evaluate the model’s
performance using measures including accuracy, precision, and recall. This shows the
combination of the oversampling method, i.e., Synthetic Minority Oversampling Technique
(SMOTE), and KNN (KNN+SMOTE) shows the performing accuracy of DGA and IEC TC10,
which are 98% and 97%, respectively [65]. Barjouei et al. [66] studied non-temporal data
from the Soroush and South Iran oil fields, analyzing 7245 samples and predicting factors
such as choke size (D64), wellhead pressure (Pwh), oil specific gravity (γo), gas/liquid
ratio, and wellhead choke. The study proposed a few models of DL, which are DL, DT,
RF, ANNs, and SVR, revealing the superior performance of DL, has a greater accuracy
R2 at 99% than the other models. Together, these studies highlight the adaptability of
Deep Learning methods to handle temporal and non-temporal data in various O&G sector
applications. The insights derived from these endeavors, specifically focusing on Deep
Learning, contribute significantly to optimizing operations and decision-making processes
in this critical industry.
The time domain of the reservoir focuses on the Volve and UNISIM-IIH oil fields
and utilizes Long Short-Term Memory (LSTM) and GRU models for the classification
of 3257 samples based on oil, gas, water, or pressure levels [67]. Regarding O&G fore-
casting, the GRU model emerged as the frontrunner. With an ideal R2 of 99%, the GRU
model emerged as the leading model for O&G forecasting. This exceptional accuracy
demonstrates the effectiveness of the suggested GRU model in predicting O&G activity
within the given reservoir setting. In the analysis of non-temporal within the well domain,
Wang et al. [68] applied various Faster R-CNN models, including Faster R-CNN_Res50,
Faster R-CNN_Res50_DC, and Faster R-CNN_Res50_FPN, along with methods involving
Edge detection and Cluster+Soft-NMS, utilizing Google Earth Imagery encompassing
439 samples. Their goal was to organize oil wells depending on breadth and height. The
Faster R-CNN model with ClusterRPN obtained 71% precision. It is important to note that
the suggested approach was less than 90% accurate and required more time to run than
other models. Table 2 includes the published research on Deep Learning models for O&G
predictive analytics.
Sensors 2024, 24, 4013 12 of 57

Table 2. Summary of the published research on Deep Learning models for predictive analytics in the O&G field.

Output Performance Advantages/


Reference Models Temporality Field Dataset Class Input Parameter Best Model
Parameter Metrics Disadvantages
Metro Interstate Fluid production (oil, The author
Traffic Volume gas, and water), suggested
LSTM +
dataset, Appliances pressure looking at
LSTM and Oil production MAE, RMSE, Seq2Seq and
[63] Temporal Reservoir Energy Prediction Prediction (bottom-hole), and another
GRU and pressure and SMAPE GRU2
dataset, and their ratios (water cut, metaheuristic
architectures
UNISIM-II-M-CO gas–oil ratio, and method, such
301 samples gas–liquid ratio). as GA.
Pipeline condition, Optimized
DCNN + The model
Real-time pipeline label, crack size, data RMSE, MAPE, DCNN +
LSTM, ANN, Natural gas showcased
[61] Temporal Pipeline crack Prediction length, sampling MAE, MSE, LSTM
SVR, LSTM, pipeline crack impressive
90,000 data samples frequency, and tube and SNR Accuracy =
and RNN performance.
pressure 99.37%
The suggested
LSTM, West Natuna Basin model
GR, Vp, LLD, LLS, Well log data MAE, RMSE, LSTM
[64] Bi-LSTM, Temporal Well dataset Prediction provided a
NPHI, and RHOB imputation MAPE, and R2 RMSE = 94%
and GRU 11,497 samples greater
accuracy.
KNN +
The proposed
DGA local power SMOTE
KNN, SVM, F7, F10, F17, F18, F19, Accuracy, model
Non- utilities and IEC TC Transformer Accuracy:
[65] and Transformer Classification F21, F24, F34, F36, precision, and outperformed
temporal 10 dataset faults DGA = 98%
XGBoost and F40 recall the other
1530 samples IEC TC 10 =
model.
97%
Compared to
Measure choke size
the other
Sorush oil field and (D64), wellhead
DL, DT, RF, Wellhead model, the
Non- oil field in southern pressure (Pwh), oil DL
[66] ANN, and Reservoir Prediction choke flow RMSE and R2 accuracy of
temporal Iran specific gravity (γo), R2 = 99%
SVR rates the suggested
7245 samples and gas–liquid ratio
model was
(GLR).
greater.
Sensors 2024, 24, 4013 13 of 57

Table 2. Cont.

Output Performance Advantages/


Reference Models Temporality Field Dataset Class Input Parameter Best Model
Parameter Metrics Disadvantages
The proposed
UNISIM-IIH and
LSTM and Oil, gas, water, or Oil and gas SMAPE and GRU model had the
[67] Temporal Reservoirs Volve Oilfield Classification
GRU pressure forecasting R2 R2 = 99% highest
3257 samples
accuracy.
Faster R-
CNN_Res50, The proposed
Faster R- method’s
CNN_Res50_DC, running time
Faster R-CNN
Faster- Precision, was higher
Non- Google Earth Imagery Clustered oil with
[68] R_CNN_Res50_FPN Well Clustering Width and height recall, F1 score, than the other
temporal 439 samples wells ClusterRPN =
with Edge and AP models, and
71%
Detection, its accuracy
and was less than
Cluster+Soft- 90%.
NMS
Sensors 2024, 24, 4013 14 of 57

2.3. Application of Fuzzy Logic and Neuro-Fuzzy Models


Neuro-fuzzy model is a hybrid model that leverages the respective advantages of both
algorithms by combining two paradigms: Fuzzy Logic (FL) and ANNs [43]. Throughout
several consecutive generations, FL’s function is to dynamically modify the crossover and
mutation rates [69]. The ANN and FL were utilized to develop the renowned Adaptive
Neuro-fuzzy Inference Systems (ANFIS) model [70]. In ANFIS, a neural network receives
input from a fuzzy inference system. The ANFIS model is also computationally feasible,
reducing the training time of the neural network [70].
The use of the ANFIS model to forecast the ruptured pressure of a faulty pipe utilizing
the diameter of the pipeline, burst pressure, thickness of the pipe wall, defect depth, and
defect width gave acceptable results, with corresponding RMSE, Mean Absolute Error
(MAE), and R2 values of 98%, 69%, and 99%, respectively [71]. The ANFIS+Principal
Component Analysis (PCA) is a proposed method that outdistanced other models and
significantly improved the model’s accuracy. Another study on O&G predictive analytics
focused on different research on O&G predictive analytics focused on the clustering that the
ANN, SVR, and ANFIS suggested in their prediction extraction of oil from a heterogeneous
reservoir using a 5-spot waterflood [44]. The study used 9000 non-temporal samples from
the reservoir in Saudi Arabia, including the degree of reservoir heterogeneity (V), mobility
ratio (M), permeability anisotropy ratio (kz/kx), wettability indicator (WI), production
water cut (fw), and oil/water density ratio (DR) data to predict the waterflood’s mobile
oil recovery efficiency (RFM). The ANN had better accuracy than the other models, with
MAPE, MAE, MSE, and R2 values of 5.1666%, 0.0093, 0.0003, and 0.997, respectively,
reducing the runtime by 0.8470 min.
In contrast, only a small number of studies [72] studied the application of ANFIS in
predictive analytics in the O&G sector. The discovered alternative ML models like ANFIS
to model and use an ML approach to maximize the oil adoption capacity of functionalized
magnetic nanoparticles. Other than ANFIS, the study also employed the Least Squares
Support Vector Machine (LSSVM) with the hybridization of a metaheuristic model, which
is the Cuckoo Search Algorithm (LSSVM-CSA), and Gene Expression Programming for
non-temporal predictions using oil data. The study addressed parameters like mixing time
(min), MNP dosage (g/L), and oil concentration (ppm) to predict oil adsorption capacity
(mg/g adsorbent). A comparative performance investigation of the ANFIS, LSSVM-CSA,
and Gene Expression Programming showed that the highest accuracy achieved was LSSVM-
CSA. The proposed method performed better than the other two models, according to
the R2 , which was 99% for the best model. Another study revealed the viability of the
Control Chart and RF for failure detection [73]. The temporal 50,000 samples from the
3W dataset were utilized. The parameters “normal”, “fault”, and “high fault” in this
dataset were derived from the sensor’s real-time well and consisted of P-PDG, T-PDG, and
T-PCK. Combining the Control Chart and RF method showed higher sensitivity (99%) and
specificity (100%). The summary of previously published research on Fuzzy Logic and
Neuro-fuzzy modeling in predictive analytics in the O&G field is shown in Table 3.
Sensors 2024, 24, 4013 15 of 57

Table 3. Published research on Fuzzy Logic and Neuro-fuzzy modeling in predictive analytics in the O&G field.

Output Performance Advantages/


Reference Models Temporality Field Dataset Class Input Parameter Best Model
Parameter Metrics Disadvantages
ANFIS,
LSSVM-CSA, Mixing time (min), MNP Oil adsorption The proposed method
The data are R2 , MPE, and LSSVM-CSA
[72] and Gene Non-temporal Oil Prediction dosage (g/L), and oil capacity (mg/g was outperformed by
confidential. MAPE R2 = 99%
Expression concentration (ppm) adsorbent) the other two models.
Programming
The proposed method
Published Pipe dimension, burst
outperformed other
ANFIS and studies pressure, pipe wall RMSE, MAE, ANFIS+PCA
[71] Non-temporal Pipeline Classification Pressure models and
ANFIS+PCA [74–78] thickness, defect depth, and R2 R2 = 99%
significantly improved
217 samples and defect width
the model’s accuracy.
CPG’s
Reservoir heterogeneity
waterflooding
degree (V), mobility
research group The The proposed model
ratio (M), permeability
at the King Fahd effectiveness of had a better accuracy
ANN, SVR, and anisotropy ratio (kz/kx), MAPE, MAE,
[44] Non-temporal Reservoir University of Clustering moveable oil ANN than the other models
ANFIS wettability indicator MSE, and R2
Petroleum and recovery during and had lower a
(WI), production water
Minerals in a flood (RFM) runtime and cost.
cut (fw), and oil/water
Saudi Arabia
density ratio (DR)
9000 samples
P-PDG, T-PDG, and Control chart +
The proposed method
RF, Fuzzy C T-PCK, and grouping of RF
3W dataset Failure detection showed higher
[73] Means, and Temporal Well Classification three classes (“normal”, Total variance Specificity = 99%
50,000 samples applications sensitivity and
Control Chart “high fault”, and “high Sensitivity =
specificity.
fault”) 100%
Sensors 2024, 24, 4013 16 of 57

2.4. Application of Decision Tree, Random Forest, and Hybrid Models


Considerable attention has been given to integrating AI and a variety of ML models
within the O&G sector, which has implications for reservoir engineering, pipeline integrity,
drilling, and transformer defect prediction. DT can handle categorical and numerical
information [79]. In several research publications, DT was used to develop models that
predict output variable values based on multiple input variables, and this algorithm
produced decisions depending on the training data it was trained on [80]. Regarding
the area of pipeline failure risk prediction, Mazumder et al. [81] extended non-temporal
applications by employing an array of models, including the KNN, DT, RF, Naïve Bayes
(NB), AdaBoost, XGBoost, Light Gradient Boosting Machine (LGBM), and CatBoost. The
study focused on crucial parameters like pipelines with failure risk, which are classified
based on their diameter, wall thickness, defect depth, fault length, yield strength, final
tensile strength, and operational pressure. Critical Resilient Interdependent Infrastructure
Systems and Processes from the National Science Foundation have 959 data samples. The
meticulous evaluation based on precision, recall, and mean accuracy identified XGBoost as
the preferred model. The proposed model needs to improve its accuracy by 85%.
Liu et al. [82] researched a variety of models to address non-temporal pipeline failure
defects with 1500 samples from well log data from North China, including the LR, Stochastic
Gradient Descent, SVM, Gaussian Process Regression (GPR), Binary Search Tree Ensemble,
Binary Decision Tree, Sine Window, and ANN. Their assessment criteria included MAE,
MSE, and RMSE, with the ANN achieving an ideal R2 performance of 99% for training
and 96% for testing, proving the efficiency of these models in resolving pipeline integrity
problems based on accuracy. Shifting to reservoir engineering, Taha and Mansour [40]
utilized 542 samples of temporal well log data from North China, featuring parameters like
C2 H2 , C2 H6 , CH4 , and H2 . Their exploration incorporated ELM, SVM, KNN, DT, RF, and
EL, specifically focusing on classifying the power transformer fault. Within this context,
the EL with training and testing accuracy values were 78% and 84%, respectively. Thus,
the performance accuracy was not above 90%. The researchers found that the best model’s
results contributed significantly to the research. In the non-temporal domain, using the
3147 samples from DGA, Saroja et al. [83] applied an array of models for transformer fault
classification, encompassing DT, Linear Discriminant Analysis (LDA), Gradient Boosting
(GB), Ensemble Tree, LGBM, RF, KNN, NB, ANN, and LR. The accuracy of the aimed study
was based on the gas parameters from the DGA dataset, which were C2 H2 , C2 H4 , C2 H6 ,
and CH4 . Considering an accuracy rating of 99.29%, the Quadratic Discriminant Analysis
(QDA) model was the performed model. In conclusion, for this research, the proposed
model obtained the best precision for the classifier model.
Extending the scope to gas type classification in transformer fault scenarios, Raj
et al. [84] employed the DT model without a comparison to the alternative model. Their
classification efforts centered around fault types using features like H2 , CH4 , C2H6 , C2 H4 ,
and C2 H2 , with an accuracy of the DT of 62.9%, emerging as a model based on accuracy
and Area Under Curve (AUC). For predicting faults in transformer oil, the current model
exhibited potential, and the researcher recommended exploring opportunities for refine-
ment to enhance overall efficacy. In drilling applications, Aslam et al. [85] navigated 1984
non-temporal samples from the 3W public database using several models, including LR,
DT, RF, KNN, SMOTE, Explainable Artificial Intelligence (XAI), Shapley Additive Expla-
nation (SHAP), and Local Interpretable Model-Agnostic Explanation (LIME). Relevant
characteristics included P-PDG, P-TPT, T-TPT, P-MON-PCK, T-JUS, PCK, P-JUS-CKGL,
T-JUS-CKGL, and QGL. Their thorough examination encompassed accuracy, recall, preci-
sion, F1 score, and AUC, eventually selecting RF as the best performance since the results
for accuracy, recall, precision, F1 score, and AUC were, 1.00%, 99.6%, 99.64%, 99.91%, and
99.77%, respectively. The proposed model yielded remarkable results.
Turan and Jaschke [86] used a dataset of 2000 samples labeled with undesirable
events, including P-PDG, P-TPT, T-TPT, P-MON-CKP, and T-JUS-CKP, to classify the 3W
dataset using various algorithms such as LDA, QDA, Linear SVC, Logistic Regression
Sensors 2024, 24, 4013 17 of 57

(LR), Decision Tree (DT), RF, and Adaboost with a temporal perspective. The assessment
measures used were F1 score and accuracy, with a particular emphasis on DT, which reached
a significant accuracy of 97%. However, feature selection increased training time rather than
improved accuracy. Remarkably, the proposed technique struggled to categorize class 2
due to limited data availability and label disputes based on estimated attributes. The other
study focused on using the same dataset and utilized one-directional, CNN, RF, Graph
Neural Network (GNN), and QDA models [87]. RF achieved a mean accuracy of 95%. The
evaluation measures used were F1 score, accuracy, precision, and recall. Specifically, the
study discovered that increasing the number of time frames enhanced mean accuracy. On
the other hand, the temporal analysis of well data completed by Brønstad et al. [88] focused
on 3W wells. The work employed ML models, namely RF and PCA. The combination of
RF and PCA achieved an accuracy of 90%. The accuracy of the suggested strategy was over
95% in each of the distinct classes, indicating that it is a valuable way to identify several
anomalous occurrences in well data.
Ben Jabeur et al. [89] used LGBM, CatBoost, XGBoost, RF, and a neural network to
assess a dataset of 2687 samples connected to the temporal characteristics of WTI crude
oil prices. The categorization challenge involved forecasting the movement of numerous
financial indicators in connection to oil prices, including green energy resources, metals
such as gold, silver, petroleum, soybeans, platinum, and copper, the Dollar Index, the
Volatility Index, the Euro, the USD, and the Bitcoin. Accuracy and Area Under the Curve
(AUC) were utilized as the assessment criteria. LGBM and RF fared better than the other
algorithms in the research. The data imply that the suggested strategy is superior to
established methods in forecasting complicated connections. Hassan Baabbad et al. [90]
investigated the prediction of CO2 levels in shale gas reserves, emphasizing non-temporal
factors. The study used ML algorithms like GB, RF, and Multiple Linear Regression (MLR)
on a dataset of 1400 samples with a variety of features such as horizontal wellbore length,
hydraulic fracture length, reservoir length, SRV fracture porosity, SRV fracture permeability,
SRV fracture spacing, total production time, and fracture pressure. The performance
was examined using MSE, and RF outperformed the other ML algorithms. The study
emphasized the usefulness of RF as a superior approach in ML for forecasting CO2 levels
in shale gas reserves compared to the other methods.
The study was evaluated by Alsaihati et al. using RF, ANNs, and Fuzzy Networks
(FNs) on real-time well data with 8983 samples of data [91]. The classification was utilized
to estimate torque and drag using attributes including weight on bit, rotating velocity,
standpipe tension, hook load, and penetration rate. The assessment measures used were
the correlation coefficient (R) and average absolute error percentage (AAPE). Based on the
study, the recommended approach predicted torque and drag during drilling operations
more correctly, and the RF model outperformed the other two models. Next, Kumar and
Hassanzadeh’s [92] work focused on the temporal elements of reservoir modeling utilizing
a 2D STARS simulation. The study’s goal was to forecast the efficacy of shale barriers in the
context of reservoir dynamics, and the ML technique used was RF. The dataset included
240 samples, including predictor factors such as effective formation compressibility, volu-
metric heat capacity, and thermal conductivity for rock, water, oil, and gas. The assessment
measures used were R2 and RMSE, with RF indicating effectiveness. The author offered
enhancements to the proposed technique by including more training data and features,
highlighting the prospect of improving the model’s prediction performance with a larger
dataset and more relevant characteristics.
In addition, Ma et al. [93] completed a non-temporal analysis to forecast burst pressure
in full-scale corroded O&G pipelines. The study utilized RF, XGBoost, SVM, and LGBM.
The dataset included 314 samples with predictor factors such as depth, length, breadth,
wall thickness, pipe diameter, steel grade, and burst pressure. The assessment measures
employed were R2 , RMSE, MAE, and MAPE. XGBoost achieved an R2 of 99% in training
and 98% in testing. The data suggested that the hybrid proposed model, presumably a
blend of two models, attained much higher levels. The research by Canonaco et al. [94]
Sensors 2024, 24, 4013 18 of 57

performed classification aimed at predicting internal corrosion, considering variables such


as odometry, latitude, longitude, elevation, length, flow regime, pressure, mass flow rates,
velocity, shear stress, and temperature on a pipeline dataset including 1,700 samples with
geometrical and fluid dynamical variables related to pipeline infrastructures. A non-
temporal analysis was performed on pipeline data using ML models, specifically XGBoost,
SVM, and Neural Networks (NNs). XGBoost achieved an accuracy of 62%. The study
suggests that the proposed model’s accuracy needs improvement, indicating the potential
for enhancements in accurately predicting internal corrosion in pipeline infrastructures.
Several studies have been conducted on the crude oil domain, such as on corrosion
and oil. The researchers used RF and CatBoost to forecast corrosion rates, focusing on
non-temporal pipeline and crude oil datasets. It consisted of 3240 samples, including
predictors such as stream composition (NO2 , NH2 S, and NCO2 ), pressure, velocity, and
temperature. The assessment measures used were R2 , MSE, MAE, and MSE [95]. CatBoost
outperformed other models in training and testing, achieving an impressive accuracy of
99.9%. The results reveal that the proposed model is more accurate in estimating corrosion
rates for the given pipeline data.
Meanwhile, the other study used the same domain, primarily using data from prior
studies on CO2 –Oil Minimum Miscibility Pressure [96]. The researchers used many ML
models, such as XGBoost, CatBoost, LGBM, RF, Deep Multilayer Networks, Deep Belief
Networks, and Convolutional Neural Network (CNNs). These 310 samples were included
in the collection, which contained data on the N2 and C1 (mole percent of volatile) and CO2 ,
H2 S, and C2 -C5 intermediate crude oil fractions, reservoir temperature, average critical
injection temperature of the gas, and molecular weight of the C5 + oil fraction. Determining
the CO2 –crude oil system’s lowest miscibility pressure was the goal. CatBoost outper-
formed the other models, as evidenced by its R2 score of 99%. The results demonstrate that
the slightest miscibility pressure for the CO2 –crude oil system can be precisely computed
using the suggested model.
A non-temporal analysis of a lithology dataset originating in the Pearl River Mouth
Basin was completed in the work by Zhu et al. [97]. An assortment of ML models was
employed to classify different lithologies, including Deep Forest (DF), DF + K-means, RF,
SVM, and Deep Neural Networks (DNNs). The collection included 601 samples from six
classes: limestone, mudstone, sandy mudstone, sandstone, siltstone, and grey siltstone.
Based on precision, recall, and Fβ measurements, DF + K-means obtained an accuracy of
90%. The study identified shortcomings in the baseline method, pointing out problems such
as noisy data, unsatisfactory minority class prediction, and insufficient labeled data. The
findings show the usefulness of DF + K-means in overcoming these issues and improving
lithology identification.
The employment of temporal DGA datasets focuses on transformer faults. The re-
searchers used RF and KNN to categorize defect types using the 11,400 sample input pa-
rameters [35]. The KNN model attained an accuracy of 88%. Another study was conducted
utilizing the same dataset with the employment of a combination of the Gaining-Sharing
Knowledge-Based Algorithm (GSK) and XGBoost (GSK-XGBoost) model for the classifi-
cation [20]. The GSK-XGBoost model scored 50% on accuracy, precision, recall, F1-score,
and beta-factor using 128 samples of gas compositions. One of the factors that affected the
performance of the model could be the involvement of various gas components and their
compositions, such as ammonia, acetaldehyde, acetone, ethylene, ethanol, toluene acety-
lene, ethylene, ethane, methane, and hydrogen in the DGA dataset. The study discovered
an increase in processing time, even after using a devised approach. The proposed model’s
accuracy from both studies did not reach 90%. The findings show a trade-off between
computing efficiency and accuracy, emphasizing the necessity for a better optimization
solution.
The same DGA processes, considering non-temporal analysis and a classification
of fault type, reported an accuracy of 87.06% when using the LGBM [98]. This work’s
dataset consisted of 796 samples with gases such as H2 , CH4 , C2 H2 , C2 H4 , and C2 H6 . The
Sensors 2024, 24, 4013 19 of 57

LGBM outperformed the other ML models, including XGBoost, RF, LR, SVM, NB, the KNN,
and DT, for the classification task concerning fault type identification. F1 score, accuracy,
precision, and recall were among the evaluation measures for model performance, and the
LGBM achieved an accuracy of 87.06%. The study concluded that the model, particularly
the LGBM, demonstrated a high level of competence in fault type classification based on
the DGA data. However, the enhancement of the model’s accuracy is necessary.
The non-temporal analysis study by Tewari et al. [8] focused on drilling operations,
particularly drill bit selection in Norwegian wells. The researchers used several ML models,
including Adaboost, RF, the KNN, NB, MLP, and the SVM. A wide range of drilling-
related features were included in the dataset, including 4312 samples with the following
characteristics: torque, standpipe pressure, mud weight, real vertical depth, weight on bit,
measured dimension, penetration rate, rounds every minute, bit type, bit size, d-exponent,
total flow area, mechanical specific energy, depth of cut, and aggressiveness of the drill bit.
The primary classification focused on drill bit selection, and the RF model demonstrated an
impressive accuracy of 91% in testing and 97% in training. The study’s considerable results
show that the proposed method is more stable, accurate, and dependable than the other
models used in drill bit selection in Norwegian wells.
The research by Santos et al. [99] employed a temporal exploration centered around
well data, specifically focusing on 3W wells. The researcher’s approach involved the
application of an RF model for classification, utilizing a dataset encompassing 1984 samples.
The dataset included crucial parameters such as the gas lift choke pressure, downstream
temperature, and gas lift flow. Their model’s performance was evaluated using metrics like
accuracy, faulty-normal accuracy (FNACC), and real faulty-normal accuracy (RFNACC),
showcasing an impressive accuracy rate of 94%. The study concludes by emphasizing the
efficacy of their proposed method in successfully identifying early faults in the well data.
The hybrid technique, K-Means+RF, performed admirably with R2 values ranging
from 92% to 98%, outperforming various baseline approaches in the study, such as using
the SVM, Local Outlier Factor (LOF), Local Factor, and RF. The study performed a temporal
analysis of reservoir data [100] to cluster sonic (DTC) using the 37 samples from the well
log. The features included depth, gamma ray, shallow resistivity, deep resistivity, neutron,
density, and CALI. Regarding the temporal analysis of well data from the United States,
which has a large field and well-scale, RF was used for clustering barrel of oil equiva-
lent [101]. This experiment used 934 samples, and the features included API, stream date,
surface latitude and longitude, formation thickness, TVD, lateral length, total proppant
mass, total injected fluid volume, API gravity, porosity, permeability, TOC, Vclay, rate of
oil production, gas production, water production, GPI, and frac fluid. Nonetheless, the
research brought attention to the necessity of increasing the accuracy since the RF model’s
testing and training RMSE values were 17.49% and 7.25%, respectively, suggesting potential
overfitting.
The study used various prediction models through temporal research, including
LSTM, AdaBoost, LR, SVR, the DNN, RF, and adaptive RF [102], focusing on crude oil
data. The employment of adaptive RF in the study shows that the model performed with
MAPE, MAE, MSE, RMSE, R2 , and Explained Variance Score (EVS) values of 112.31%, 52%,
53%, 73%, 99%, and 99%, respectively, outperforming other models. Based on the study’s
findings, it’s critical to consider the advantages and disadvantages of the proposed model
because it operates for a longer period than other models used in the study. Another study
employed RF in their experiment to classify the decommissioning options in the O&G
field and utilized 1846 samples from the public O&G dataset [103]. The study was divided
into two types of accuracy, with a comparison between RF, KNNs, NB, DT, and NNs. The
higher accuracies gathered from RF for full and redundant features that were removed
were 80.06% and 80.66%, respectively. However, the suggested approach must be improved
because the accuracy was less than 90%.
Following the non-temporal analysis of well logging data, RF with Analog-to-digital
converters was used for clustering, with 100 samples and features, including neutron
Sensors 2024, 24, 4013 20 of 57

(CNL), gamma ray (GR), density (DEN), and compressional slowness (DTC) [104]. The
study’s RMSE (9%), MAE (6%), MAPE (0.031%), and MSE (86%) values indicate that the
clustering task’s accuracy might be improved. Further, using pipeline data with climate
change components, the study employed the KNN, Multilayer Perceptron Neural Network,
multiclass SVM, and XGBoost model to classify temporal analysis [105]. The features
included temperature, humidity, and wind speed from 81 samples. The XGBoost model’s
accuracy outperformed other models by 92%, leaving room for additional improvement.
Al-Mudhafar et al. [106] worked on well data using LogitBoost, GB, XGBoost, Ad-
aBoost, and the KNN for classification with lithofacies and a well log dataset of 399 samples,
which take into account the following parameters: gamma ray (GR), caliper (CALI), neutron
(NEU), sonic transit time (DT), bulk density (DEN), deep resistivity (RES DEP), shallow
resistivity (RES SLW), total porosity (PHIT), and water saturation (SW). The XGBoost model
performed admirably, surpassing other techniques with a Total Percent Correct (TPC) accu-
racy measures of 97%. Subsequently, Wen et al.’s [107] study on a non-temporal pipeline
dataset used recursive feature elimination and particle swarm optimization-AdaBoost
for clustering. The collection included 3986 samples with information about landslide
risk and long-distance pipelines and consisted of a few parameters, which were landslide
susceptibility area (km2 ), percentage (%), and historical landslides (number). The model
attained 90% accuracy during training and 83% accuracy during testing, indicating that the
proposed clustering strategy must be improved in terms of accuracy.
In the research from Otchere et al.’s study [106,108], which focuses on analysis in the
reservoir domain, specifically using the non-temporal Equinor Volve Field datasets, two
models employed Bayesian Optimization with XGBoost (BayesOpt-XGBoost) and XGBoost.
The dataset comprised 2853 samples, and the classification task involved DT, GR, NPHI, RT,
and RHOB as features, aiming to predict Vshale, porosity, and water saturation (Sw). The
evaluation metrics encompassed RMSE and MAE. The BayesOpt-XGBoost model achieved
an overall accuracy of 93%, with a precision of 98%, a recall of 86%, and a combined F1
score of 93%. Despite these encouraging outcomes, the research indicates that there may be
room for improvement in the model’s performance as the suggested approach may not be
reliable enough to forecast every output variable. Lastly, a study in the temporal drilling
analysis, which used RF and DT, emphasized the need for data confidentiality [109]. The
prediction task used weight on drill string rotation speed, rate of penetration, and pump
rate as secret features to forecast rock porosity. The RF model performed exceptionally well,
with an accuracy of 99% in training and 90% in testing, demonstrating its durability and
dependability in handling sensitive drilling data. The literature on the use of DT, RF, and
hybrid models is compiled in Table 4.
Sensors 2024, 24, 4013 21 of 57

Table 4. Summary of the literature on the application of decision tree, random forest, and hybrid models.

Output Performance Advantages/


Reference Models Temporality Field Dataset Class Input Parameter Best Model
Parameter Metrics Disadvantages
National Science Pipe diameter, wall
Foundation (NSF) thickness, defect
KNN, DT, RF,
Critical Resilient depth, defect length, Precision, recall, The proposed model
NB, AdaBoost, Failure risk XGBoost
[81] Non-temporal Pipeline Interdependent Classification yield strength, and Mean needs improvement
XGBoost, and pipeline Accuracy = 85%
Infrastructure Systems ultimate tensile accuracy in accuracy.
CatBoost
and Processes (CRISP) strength, and
959 samples operating pressure
XGBoost
LR, RF, SVM, Well log data from CAL, CNL, AC, GR, Shear wave
R2 = 99% The best model was
[82] XGBoost, and Non-temporal Reservoir North China Classification PE, RD, RMLL, RS, SP, travel time R2
(training) and significant.
ANN 1500 samples DEN, DTS, and SP (DTS)
96% (testing)
The proposed
EN
ELM, SVM, Power model’s
DGA C2 H2 , C2 H6 , CH4 , and Accuracy = 78%
[40] KNN, DT, RF, Temporal Transformer Classification transformer Mean accuracy performance
542 samples H2 (Training) and
and EL faults accuracy was not
84% (Testing)
above 90%.
DT, LDA, GB, Accuracy, AUC,
Ensemble Tree, recall, precision, The proposed
QDA
LGBM, RF, DGA C2 H2 , C2 H4 , C2 H6 , Transformer F1 score, Kappa, method had the
[83] Non-temporal Transformer Classification Accuracy =
KNN, NB, LR, 3147 samples and CH4 faults MCC, and best accuracy
99.29%
QDA, Ridge, Processing classifier model.
and SVM-Linear runtime
The current model
KG, including exhibited potential,
hydrogen (H2 ), and we recommend
Incipient faults DT
KG composition methane (CH4 ), ethane Accuracy and exploring
[84] DT Temporal Well Classification in transformer Accuracy =
180 samples (C2 H6 ), ethylene AUC opportunities for
oil. 62.9%
(C2 H4 ), and acetylene refinement to
(C2 H2 ) enhance its overall
efficacy.
RF
Accuracy =
P-PDG, P-TPT, T-TPT,
LR, DT, RF, 99.6%, recall =
P-MON- PCK, T-JUS, Detect Accuracy, recall, The result of the
KNN, SMOTE, 3W 99.64%,
[85] Non-temporal Well Classification PCK, P-JUS- CKGL, anomalies in oil precision, F1 proposed model was
XAI, SHAP, and 1984 samples precision =
T-JUS- CKGL, and wells score, and AUC significant.
LIME 99.91%, F1 score
QGL
= 99.77%, and
AUC = 1.00%.
Sensors 2024, 24, 4013 22 of 57

Table 4. Cont.

Output Performance Advantages/


Reference Models Temporality Field Dataset Class Input Parameter Best Model
Parameter Metrics Disadvantages
The feature selection
did not boost
accuracy, and
training time was
increased with
LDA, QDA,
P-PDG, P-TPT, T-TPT, feature selection.
Linear SVC, LR, 3W dataset Undesirable F1 score and DT
[86] Temporal Well Classification P-MON-CKP, and The proposed
DT, RF, and 2000 samples events accuracy Accuracy = 97%
T-JUS-CKP method struggled
Adaboost
with class 2 due to
limited data and
mismatched labels
from calculated
features.
External defects of Consider the defect’s The accuracy of the
DT, ANN, SVM. Classification DT
pipelines in the United length, breadth, and model was
[110] LR, KNN, and Temporal Pipeline Classification for pipeline Accuracy Accuracy =
States pipeline’s nominal significant to the
NB corrosion 99.9%
7000 samples thickness. research.
Gold, silver, crude oil,
platinum, copper, the
LGBM, The proposed
dollar index, the
CatBoost, WTI crude oil Accuracy and method indicated
[89] Temporal Crude oil Classification volatility index, and Oil prices LGBM and RF
XGBoost, RF, 2687 samples AUC superiority over
the Euro Bitcoin:
and NN traditional methods.
Green Energy
Resources ESG.
Horizontal wellbore
length, hydraulic
fracture length,
The best method
GB, RF, and Shale gas reservoirs reservoir length, SRV
[90] Non-temporal Reservoir Prediction CO2 MSE RF surpassed the other
MLR 1400 samples fracture porosity,
method in ML.
permeability, spacing,
pressure, and total
production time.
Standpipe pressure
(SPP), weight on bit
The proposed model
(WOB), rotary speed
RF, ANN, and Real time Well-1 data Torque and drag had higher accuracy
[91] Temporal Drilling Classification (RS), flow rate (Q), R and AAPE RF
FN 8983 samples (T&D) than the other two
hook load (HL), rate of
models.
penetration (ROP),
and rotary speed (RS)
Sensors 2024, 24, 4013 23 of 57

Table 4. Cont.

Output Performance Advantages/


Reference Models Temporality Field Dataset Class Input Parameter Best Model
Parameter Metrics Disadvantages
The author
Formation
suggested
compressibility,
2D simulation in incorporating more
volumetric heat
[92] RF Temporal Reservoir STARS Prediction Shale barrier R2 and RMSE RF training data and
capacity, rock, water,
240 samples features to improve
oil, and thermal
the proposed
conductivity
method.
Depth, length, and
width of corrosion Burst pressure XGBoost The hybrid
RF, XGBoost, Full-scale corroded
defects, wall thickness, of gas and oil R2 , RMSE, MAE, R2 = 99% proposed model had
[93] SVM, and Non-temporal Pipeline O&G pipelines Prediction
pipe diameter, steel corroded and MAPE (training) and significantly higher
LGBM 314 samples
grade, and burst pipelines 98% (testing) prediction accuracy.
pressure
Geometrical
parameters: start of
odometry, end of
odometry. Latitude,
longitude, elevation,
and the length of bar.
Water volumetric flow
rate, continuous
velocity, water film
Internal
OLGA data and PIG shear stress, hold-up, The proposed model
XGBoost, SVM, corrosion in Mean accuracy XGBoost
[94] Non-temporal Pipeline data Classification flow regime, pressure, needs improvement
and NN pipeline and F1 score Accuracy = 62%
1700 samples total mass, and in accuracy.
infrastructures
volumetric flow rate
inclination,
temperature, section
area, gas mass and
volumetric flow rates,
gas velocity, wall shear
stress, total water mass
and flow rate
(including vapor),
Stream composition
CatBoost The proposed
(NO2 , NH2 S, and
Crude oil dataset R2 , MSE, MAE, Accuracy = model’s accuracy
[95] RF and CatBoost Non-temporal Pipeline Prediction NCO2 ), pressure (P), Corrosion rates
3240 samples and RMSE 99.9% (training outperformed the
velocity (v), and
and testing) other models.
temperature (T)
Sensors 2024, 24, 4013 24 of 57

Table 4. Cont.

Output Performance Advantages/


Reference Models Temporality Field Dataset Class Input Parameter Best Model
Parameter Metrics Disadvantages
Acetylene (CC2 HH2 ),
The proposed model
ethylene (CC2 HH4 ), Identify
DGA KNN needs an
[35] RF and KNN Temporal Transformer Classification ethane (CC2 HH6 ), transformer Mean accuracy
11,400 samples Accuracy = 88% improvement in
methane (CCHH4 ), fault types
accuracy.
and hydrogen (HH2 )
Crude oil fractions
(N2 , C1 , H2 S, CO2 , and
XGBoost, C2 -C5 ), average critical
Previous studies on Estimating the The proposed model
CatBoost, injection gas ARD, AARD,
CO2 –oil MMP MMP of CatBoost confirmed its
[96] LGBM, RF, deep Non-temporal Crude oil Classification temperature (Tcave), RMSE, MPa,
databank CO2 –crude oil R2 = 99% superiority over
MLN, DBN, and reservoir temperature and SD
310 samples system other models.
CNN (Tres), and molecular
weight of C5+ fraction
(MWc5+)
Sandstone (S00),
The baseline method
siltstone (S06), grey
Lithology dataset from had poor prediction
DF + K-means, siltstone (S37),
the Pearl River Mouth Lithology Precision, recall, DF + K-means of the minority class,
[97] RF, SVM, DNN, Non-temporal Lithology Classification mudstone (N00),
Basin identification and Fβ Accuracy = 90% small-amount data
and DF sandy mudstone
601 samples label, error labeling,
(N01), and limestone
and noisy data.
(H00).
The accuracy of the
Ethanol,
GSK-XGBoost model
Ammonia, ethylene, Accuracy,
GSK- XGBoost fell below 90% after
DGA acetaldehyde, acetone, ammonia, precision, recall,
[20] GSK- XGBoost Temporal Transformer Classification Mean accuracy employing the
128 samples ethylene, ethanol, and acetaldehyde, F-measure, and
= 50% developed strategy,
toluene acetone, and beta-factor
while computational
toluene
time increased.
LGBM,
Accuracy, LGBM The model
XGBoost, RF, LR, DGA H2 , CH4 , C2H2 , C2 H4 , Fault type
[98] Non-temporal Transformer Classification precision, recall, Accuracy = demonstrated a high
SVM, NB, KNN, 796 samples and C2 H6 classification
and F1 score 87.06% level of competence.
and DT
Sensors 2024, 24, 4013 25 of 57

Table 4. Cont.

Output Performance Advantages/


Reference Models Temporality Field Dataset Class Input Parameter Best Model
Parameter Metrics Disadvantages
Parameter used:
Depth as measured,
vertical true depth,
penetration rate, bit
weight, minutes per
round, torque, Accuracy, The proposed
RF
Adaboost, RF, Drill bit type in standpipe pressure, precision, F1 method was more
Drill bit Accuracy = 97%
[8] KNN, NB, MLP, Non-temporal Drilling Norwegian wells Classification mud mass, flow rate, score, recall, reliable, stable, and
selection (training) and
and SVM 4312 samples total gas, bit kind, bot MCC, and accurate than
91% (testing)
quantity, D-exponent, G-mean previous models.
area of total flow,
specific mechanical
energy, cut depth, and
aggressiveness of drill
bit.
Accuracy,
faulty-normal
P-PDG, P-TPT, P-PCK, The proposed
accuracy
3W T-PCK, P-JUS-CKGL, Early fault RF method had good
[99] RF Temporal Well Classification (FNACC), real
1984 samples T-JUS-CKGL, and gas detection Accuracy = 94% detection of the early
faulty-normal
lift flow fault.
accuracy
(RFNACC)
P-PDG, T-TPT,
One-Directional, P-MON-CKP, Accuracy, RF
3W Anomalous The time windows
[87] CNN, RF, GNN, Temporal Well Classification T-JUS-CKP, precision, recall, Mean accuracy
1984 samples events in oil increased.
and QDA P-JUS-CKGL, and and F1 score = 95%
QGL
The proposed
P-PDG, P-TPT, T-TPT, Anomalous
3W RF+PCA method’s accuracy >
[88] RF and PCA Temporal Well Classification P-MON-CKP, and events in oil Accuracy
1984 samples Accuracy = 90% 95% for all the
T-PCK wells
classes.
Depth, gamma ray, The proposed
shallow resistivity, K-Means+RF hybrid approach
SVM, LOF, and Well log data
[100] Temporal Reservoir Clustering deep resistivity, Sonic (DTC) R2 R2 = 0.92 to outperformed
RF 37 samples
neutron, density, R2 = 0.98 several baseline
CALI, and DTS methods.
Sensors 2024, 24, 4013 26 of 57

Table 4. Cont.

Output Performance Advantages/


Reference Models Temporality Field Dataset Class Input Parameter Best Model
Parameter Metrics Disadvantages
API, On-stream date,
Surface latitude and
longitude, formation
The proposed
thickness, TVD, lateral
method needs
length, total proppant
improvement in
Field and well dataset mass, total injected RF
Barrel of oil accuracy.
from public dataset fluid volume, API RMSE:
[101] RF Temporal Well Clustering equivalent RMSE and R2 The RF model was
U.S. well gravity, porosity, Train = 7.25%
(BOE) overfitting, and the
934 samples permeability, TOC, Test = 17.49%
accuracy of the
VClay, oil production
proposed method
rate, gas production
must be improved.
rate, water production
rate, GPI, and frac
fluid
RF with analog-
Neutron (CNL), to-digital
The proposed model
RF with Analog- gamma ray (GR), RMSE, MAE, converters
Well logging dataset Well logging needs improvement
[104] to-digital Non-temporal Well Clustering density (DEN), and MAPE, and RMSE = 9%,
100 samples data generation in accuracy for
converters compressional MSE MAE = 6%,
clustering.
slowness (DTC) MAPE = 0.031%,
and MSE = 86%
H2 (hydrogen), CH4
(methane), C2 H2 For the evaluation
(acetylene), C2 H4 dataset, the
RF
DPM1 and DPM2 for (ethylene), C2 H6 suggested models
Transformer Accuracy:
[111] RF Temporal Transformer DGA Classification (ethane), CO (carbon Accuracy diagnosed errors
fault diagnosis DPM1 = 96.2%
2123 samples monoxide), CO2 with a satisfactory
DPM2 = 96.5%
(carbon dioxide), O2 level of
(oxygen), and N2 performance.
(nitrogen)
KNN,
Location, time,
Multilayer The model
pipeline age, pipeline Accuracy,
Perceptron Climate change data XGBOOST outperformed other
[105] Temporal Pipeline Classification material, temperature, Gas pipeline precision, recall,
Neural Network, 81 samples Accuracy = 92% models; however, it
humidity, and wind and F1 score
multiclass SVM, needs improvement.
speed.
and XGBoost
LogitBoost, Total Percent The model gave
Lithofacies and well GR, CALI, NEU, DT,
GBM, XGBoost, Lithofacies Correct (TPC) is XGBoost significant results for
[106] Temporal Well log dataset Classification DEN, RES DEP, RES
AdaBoost, and predictions an accuracy TPC = 97% the proposed
399 samples SLW, PHIT, and SW
KNN measure method.
Sensors 2024, 24, 4013 27 of 57

Table 4. Cont.

Output Performance Advantages/


Reference Models Temporality Field Dataset Class Input Parameter Best Model
Parameter Metrics Disadvantages
Recursive
feature
Recursive
Changshou-Fuling- elimination and
feature Landslide Accuracy,
Wulong-Nanchuan particle swarm The proposed model
elimination and susceptibility area, Long-distance sensitivity,
[107] Non-temporal Pipeline (CN) gas pipeline Clustering optimization- needs improvement
particle swarm percentage, and pipelines precision, and
dataset AdaBoost in accuracy.
optimization- historical landslides F1 score
3986 samples Accuracy = 90%
AdaBoost
(training) and
83% (testing)
Adaptive RF
The proposed model
MAPE = 112.31%;
LSTM, United states’ Energy outperformed the
MAPE, MSE, MAE = 52%;
AdaBoost, LR, Information Shape, location, and Crude oil price others; however, the
[101] Temporal Crude oil Prediction RMSE, MAE, MSE = 53%;
SVR, DNN, RF, Administration scale (COP) running time was
and EVS RMSE =73%;
and adaptive RF Brent COP data higher than those of
R2 = 99%; and
the other models.
EVS = 99%
WOB, torque,
standpipe pressure, RF
The model stood out
The data are drill string rotation R2 , AAPE, and Accuracy = 99%
[109] RF and DT Temporal Drilling Prediction Rock porosity for its exceptional
confidential. speed, rate of VAF (training) and
performance.
penetration, and pump 90% (testing)
rate
BayesOpt-
XGBoost
Vshale, porosity, The proposed
Accuracy = 93%,
BayesOpt- Equinor Volve Field horizontal method was not
DT, GR, NPHI, RT, and precision score =
[108] XGBoost, and Non-temporal Reservoir datasets Classification permeability RMSE and MAE robust enough to
RHOB 98%, recall score
XGBoost 2853 samples (KLOGH), and predict all the
= 86%, and
water saturation output.
combined F1
score = 93%
RF
Dimensions,
Accuracy: Full
New O&G circumference, length,
Predictive de- Recall, precision, features = The proposed
RF, KNN, NB, decommissioning metal, plastic, concrete,
[103] Temporal Transformer Classification commissioning F1 score, and 80.06% method needs
DT, and NN dataset from GitHub residues,
options AUC Redundant improvement.
1846 samples environmental
removed =
expenses, and weight
80.66%
Sensors 2024, 24, 4013 28 of 57

2.5. Application of Interrelated AI Models


The O&G industry has seen a significant spike in the implementation of AI models
for more robust predictive capabilities and better decision-making processes. As a kernel-
based ML approach, the SVR algorithm has an excellent non-linear modeling capacity
and is frequently employed for predictive analytics O&G [112]. MLR analysis is a method
of finding a quantity’s reliance on a set of independent factors that are among the most
extensively used and ancient. MLR has several advantages: its interpretability, simplicity,
and capacity for varied adjustments over time. Additionally, it permits inference based
on homogeneity, normalcy, and the intercorrelation between predictor variables and error
εp [113]. Expanding on AI applications, Guo et al. [114] ventured into non-temporal gas
well data, utilizing MLR, SVR, and GPR to predict gas well parameters. The study used
129 samples of M6COND and M6GAS datasets to cluster the output variable, which is the
gas well, from the input parameters, including fluid volume, proppant amount, cluster
counts, stage counts, total horizontal lateral length, gas saturation, total organic carbon
content, and condensate–gas ratio. GPR emerged as the preferred model based on metrics,
including RMSE and R2 . However, the proposed method needs improvement in accuracy.
By classifying oil, gas, and water from 1968 samples from O&G production in five well
reservoirs owned by Saudi Aramco, Ibrahim et al. [115] investigated the temporal prediction
of corrosion defect depth in pipelines using parameters like location, contact, permeability
average, volume, production, wellhead and bottom-hole pressure, and ratio. The study
used a variety of AI models, including XGBoost, the ANN, the RNN, MLR, Polynomial
Linear Regression (PLR), SVR, Decision Tree Regression (DTR), and RF Regression (RFR).
Evaluation measures, including R2 , MAE, MSE, and RMSE, revealed that the RNN properly
categorized oil, gas, and water at 98%, 87%, and 92%, respectively. The suggested model’s
output needs to be improved. In the non-temporal domain of O&G production classification.
The researcher employed an MLP, RF, and SVR with a few parameters, such as the impact
of transportation interruption, safety, health, environmental and ecological factors, and
equipment maintenance, to assess 149,940 input samples and a historical record of pipeline
failure [116]. The researchers suggested approaches to produce the best-fitting results and
use the least computation time.
The dataset of the non-temporal study of reservoir data had 147 samples, including
reservoir temperature, oil composition, and gas composition [117], with the objective
variable being the minimal miscibility pressure between CO2 and crude oil. The assessment
statistic used was MSE. The POLY kernel-based SVM model outperformed other models’
accuracy, as seen by its performance. The data reveal that the SVM model with the
POLY kernel is excellent in identifying minimal miscibility pressure based on the supplied
reservoir. The other temporal analysis focused on the well study by Marins et al. [22], using
various ML models. This included RF, the ANN, LSTM, the Independent Recurrent Neural
Network, and CatBoost, along with 1984 samples to classify faults in oil well production,
including the involvement of features such as P-PDG, T-TPT, P-TPT, Initial Normal, Steady-
state, and transient events. The performance evaluation for the ARN model was accuracy
at 96%, recall at 84%, and F-measure at 85%. However, this research noted that the best
model was not robust due to misclassifications for undesirable events of type 3 and type
8 fault classifications. This indicates the need for further refinement to enhance the model’s
robustness in fault detection and classification for these specific events.
Regarding temporal pipeline analysis with an emphasis on Iranian oil fields, Naserzadeh
and Nohegar [118] presented an in-depth study that made use of several SVR models
enhanced by GA, PSO, Firefly Algorithm (FA), Bat Algorithm, Cuckoo Optimization Algo-
rithm (COA), Grey Wolf Optimizer (GWO), Harmony Search (HAS), Imperialist Competi-
tive Algorithm (ICA), Shuffled Frog-Leaping Algorithm (SFLA), and Simulated Annealing
(SA). The models were used to forecast carbon steel corrosion rates using 340 samples
and various characteristics such as pit depths, exposure period, operating pressure, and
chemical concentrations. The results showed that the SVR-GA-PSO model outperformed
Sensors 2024, 24, 4013 29 of 57

the others exceptionally, with an R2 of 99%, RMSE of 0.0099, MSE of 9.84 × 10−5 , MAE of
0.008, RSE of 0.001, and EVS of 0.955.
The model used in a study by Yuan et al. [119] were Gradient Boosting DT, Physics-
Based Bayesian Linear Regression (PBBLR), Bayesian Linear Regression (BLR), and ANN
with the usage of non-temporal pipeline domain. With 728 samples from the Supervisory
Control and Data Acquisition (SCADA) system, the models attempted to predict factors
such as the original length of mixed oil, transportation distance, diameter, and Reynolds
number. Although PBBLR is regarded as a superior method, the assessment metrics,
i.e., RMSE, MAE, and R2 , indicate that the accuracy should be improved. The proposed
model could benefit from additional improvements. These collective studies showcase
the versatile applications of AI models in addressing crucial challenges within the O&G
industry, encompassing diverse aspects such as predicting pipeline corrosion, gas well
parameters, natural gas pipeline failures, and O&G production outcomes. Incorporating
innovative optimization techniques underscores the industry’s commitment to harnessing
advanced technologies for enhanced operational efficiency and robust risk management
strategies. Table 5 contains previous research published on interrelated AI models for
predictive analytics in the O&G field.

2.6. Application of Statistical Models


The statistical model’s behavior is a system simulated mathematically, representing
the relationships between one or more parameters. Regression and temporal analysis
are two statistical modeling techniques that take advantage of this minimization process.
Bivariate time series analysis is different from regression analysis, which uses time as an
independent or predictor parameter. On the other hand, a bivariate analysis is carried
out on two or more statistically linked variables in regression. Furthermore, the bivariate
regression model assumes the independence of each measure. To clarify, the order of
the predictor and data pairings is not relevant in bivariate regression. However, time
series analysis does identify and make use of time dependency to improve the prediction
accuracy or understanding of the underlying physical processes [43]. Therefore, identifying
temporal patterns requires a deep understanding of mathematics. Temporal modeling
techniques that are commonly employed include autoregressive (AR), moving average
(MA), autoregressive moving average (ARMA), autoregressive integrated moving average
(ARIMA), and seasonal autoregressive integrated moving average (SARIMA) [120,121].
Several studies have explored diverse approaches in the domain of statistical methods for
predictive analytics in the O&G industry.
Sensors 2024, 24, 4013 30 of 57

Table 5. Previous research published on interrelated AI models for predictive analytics in the O&G field.

Output Performance Advantages/


Reference Models Temporality Field Dataset Class Input Parameter Best Model
Parameter Metrics Disadvantages
Condensate–gas ratio,
total horizontal lateral
length, gas saturation,
M6COND and total organic carbon The proposed method
MLR, SVR, and
[114] Non-temporal Gas M6GAS Clustering content, cluster and Gas well RMSE and R2 GPR needs improvement in
GPR
129 samples stage counts, proppant accuracy.
amount, fluid volume,
and total horizontal
lateral length
Location, contact,
average permeability, RNN
XGBoost, ANN, Saudi Aramco
volume, production, R2 : The proposed model
RNN, MLR, PLR, O&G pro- of five well Oil, gas, and R2 ,
MAE, MSE,
[115] Temporal Classification pressure ratio between Oil = 98% needs improvement in
SVR, DTR, and duction reservoirs water and RMSE
the wellhead and Gas = 87% output.
RFR 1,968 samples
bottom-hole, and Water = 92%
production
Effects of transportation
History record disruptions on safety The proposed methods
of pipeline and health, the Natural gas RMSE, MAE, had the shortest
[116] MLP, RF, and SVR Non-temporal Pipeline Classification RF
failure environment and pipeline failure MSE, and R2 computing times and
149,940 samples ecology, and equipment best-fitting results.
maintenance
Minimum The proposed model’s
Reservoir temperature,
MMP data miscibility SVM-POLY accuracy
[117] SVM Non-temporal Reservoir Classification oil composition, and gas MSE
147 samples pressure of CO2 kernel outperformed the
composition
and crude oil other models.
RF, ARN, LSTM, ARN
The proposed model
Independently P-PDG, T-TPT, P-TPT, Accuracy = 96%
Accuracy, was not robust due to
Recurrent Neural 3W Initial Normal, Oil well Precision = 88%
[22] Temporal Well Classification precision, recall, misclassifications for
Network, 1984 samples Steady-state, and production Recall = 84%
F score undesirable events for
component-wise transient F-measure =
type 3 and type 8.
gradient 85%
Sensors 2024, 24, 4013 31 of 57

Table 5. Cont.

Output Performance Advantages/


Reference Models Temporality Field Dataset Class Input Parameter Best Model
Parameter Metrics Disadvantages
SVR-GA-PSO, Onshore oil and gas
SVR, SVR-GA, pipelines: pit depths, SVR-GA-PSO
SVR-FA, SVR-PSO, exposure times, pitting R2 = 99%
SVR-ABC, start times, operational RMSE = 0.0099
MSE, RMSE, The proposed model
SVR-BAT, Iranian oil fields pressures, temperatures, Carbon steel MSE = 9.84 ×
[118] Temporal Pipeline Classification MAE, EVS, R2 , showed a better result
SVR-COA, 340 samples water cuts, redox corrosion rate 10−5
and RSE than the other ones.
SVR-GWO, potentials, resistivities, MAE = 0.008
SVR-HAS, pH, concentrations of RSE = 0.001
SVR-ICA, and sulfate and chloride ions, EVS = 0.955
SVR-SFLA and production rates
SCADA
The PBBLR method
(Supervisory
BLR, PBBLR, Diameter, Reynolds needs improvement
Control and
ANN, and number, transportation Actual mixed oil RMSE, MAE, on the accuracy of
[119] Non-temporal Pipeline Data Prediction PBBLR
Gradient Boosting distance, and mixed oil length and R2 using SCADA dataset
Acquisition)
DT length to predict actual mixed
system
oil length
728 samples
Sensors 2024, 24, 4013 32 of 57

Liu et al. [122] delved into the application of seasonal autoregressive SARIMA, LSTM,
and autoregressive (AR) models. The researcher focused on transformer using DGA dataset
consisted of 610 samples, considering parameters like H2 , CH4 , C2 H4 , C2 H6 , CO, CO2 , and
total hydrocarbon (TH) to predict dissolved gas concentrations. The evaluation metric, i.e.,
the Accuracy Relative Error (ARE), highlighted the SARIMA model’s efficacy in capturing
seasonal variations and long-term dependencies within the transformer DGA dataset.
Yang et al. [62] extended the exploration of statistical methods in wells, employing LSTM
and ARIMA models. Concentrating on the Longmaxi Formation of the Sichuan Basin
with 3650 data samples, they used date and daily production data to forecast shale gas
production. The evaluation metrics, including MAE, RMSE, and R2 , demonstrated the
effectiveness of LSTM in capturing temporal dependencies and ARIMA in handling time
series forecasting tasks. However, the model’s accuracy was 63% and needs improvement.
Moreover, Xuemei Li et al. [123] contributed to the field of statistical methods, specifically
examining the Grey Model (GM), Fractional Grey Model (FGM), Data Grouping-Based
Grey Modeling Method (DGGM), ARIMA, PSO for Grey Model (PSOGM), and PSO-based
data grouping grey model with fractional order accumulation (PSO-FDGGM). Their study,
focusing on natural gas in China, aimed to predict natural gas production during training.
MAPE served as the evaluation metric, with PSO-FDGGM showcasing its effectiveness in
optimizing the statistical models for accurate predictions, with the result of MAPE is 3.19%.
The model’s performance is noteworthy and reliable.
Collectively, these studies underscore the diverse applications of statistical methods in
predictive analytics for the O&G sector. The SARIMA, LSTM, ARIMA, GM, FGM, DGGM,
AR, PSOGM, and PSO-FDGGM are recognized as effective tools for handling temporal
dependencies, forecasting production, and optimizing model parameters. The specifics
of the data and the nature of the predictive analytics work determine which statistical
approaches are best, highlighting the need for a customized strategy in the O&G sector.
Table 6 highlights previous studies on a statistical model for predictive analytics modeling
in the O&G field.

2.7. Alternative ML Models Utilized for Predictive Analytics in the O&G


Several researchers have investigated various methods of developing ML models for
predictive analytics in the O&G sector. Rashidi et al. [124] investigated the Multi-Ensemble
Learning Machine-Genetic Algorithm, Multi-Ensemble Learning Machine-Particle Swarm
Optimization (MELM-PSO), Least Squares Support Vector Machine-Genetic Algorithm
(LSSVM-GA), and Least Squares Support Vector Machine-Particle Swarm Optimization
(LSSVM-PSO) for non-temporal predictions in crude oils. Their considerations included
temperature (T), ratio of gas oil solution (Rs), gas concentration (γg), and oil viscosity (API),
with an emphasis on the pressure at the bubble point and oil production volume factor,
with 638 samples of data from the crude oil database. The evaluation metrics, including
RMSE, highlighted the superiority of the MELM-PSO in optimizing model performance.
The hybrid proposed model outperformed the empirical method. The temporal analysis
was centered on a gas leakage dataset from the research by Gong et al. [125]. For the
classification of estimating gas pipeline leakage, the researchers used a variety of ML
models, including the CNN, Linear Support Vector Machine (Linear SVM), Gaussian
Support Vector Machine (Gaussian SVM), and a combination model, i.e., SVM+CNN. The
study utilized a dataset of 1000 samples of gas types such as methane, ethane, propane,
isobutane, butane, helium, nitrogen, hydrogen sulfide, and carbon dioxide. The assessment
criteria were accuracy, and the accuracy of SVM was 95.5%. The study noted the model’s
excellent performance, claiming that the SVM model stood out for accurately estimating
gas pipeline leakage using the available information.
Sensors 2024, 24, 4013 33 of 57

Table 6. Previous studies on statistical models for predictive analytics modeling in the O&G field.

Input Output Performance Advantages/


Reference Models Temporality Field Dataset Class Best Model
Parameter Parameter Metrics Disadvantages
H2 , CH4 ,
C2 H4 , C2 H6 , The SARIMA
SARIMA, DGA CO, CO2 , and Dissolved gas method had a
[122] Temporal Transformer Prediction ARE SARIMA
LSTM, and AR 610 samples total concentration good average
hydrocarbon accuracy
(TH).
Longmaxi
The accuracy
Formation of LSTM
LSTM and Date and daily Shale gas MAE, RMSE, of the model
[62] Temporal Wells the Sichuan Prediction Accuracy =
ARIMA production production and R2 needs
Basin 0.63%
improvement.
3650 samples
GM, FGM, The model’s
Quarterly Training
DGGM, performance
production of period and Natural gas PSO-FDGGM
[123] ARIMA, Temporal Gas Prediction MAPE was
natural gas in natural gas production MAPE = 3.19%
PSOGM, and noteworthy
China production
PSO-FDGGM and reliable.
Sensors 2024, 24, 4013 34 of 57

Furthermore, Chung et al. [126] investigated PCA, SVM, and LDA for temporal
predictions in oil. Their study utilized real-time oil samples, where the pore size (R)
remained constant, and the capillary flow rate (l2/t) was a function of interfacial properties
(γLG and θ) and viscosity (µ) to predict oil types and 30 samples from real-time oil samples.
The evaluation metric used was accuracy, emphasizing the capability of the SVM to capture
the underlying patterns in the temporal dataset, with an accuracy predicted of 90%. In the
experiment by Mohamadian et al. [127], the analysis focused on a non-temporal well-log
dataset from three drilled wellbores. The researchers employed ML models, specifically
Multilayer Perceptron with PSO (MLP-PSO) and Multilayer Perceptron with GA (MLP-GA),
for the prediction task involving variables such as depth, compressional wave velocity (Vp),
shear wave velocity (Vs), bulk density (ρ), and pressure pore (Pp), with the target being the
probable depth of casing collapse. The dataset included 22,323 samples, and the evaluation
metrics comprised R2 and RMSE. The performance of the proposed method indicates that
the accuracy of the MLP-PSO model outperformed that of the other models.
Next, the research by Sabah et al. [128] concentrated on drilling activity utilizing non-
temporal data from 305 wells drilled and located in the Marun oil field. The researchers
tested several ML models, including the hybridization of the Least Square Support Vector
Machine (LSSVM) with COA, PSO, and GA, MLP-COA, MLP-PSO, MLP-GA, LSSVM, and
MLP, to predict parameters such as northing, easting, depth, meterage, time of drilling,
formation type, size of hole, weight on bit, flow rate, weight of mud, MFVIS, retort solid,
pore pressure, fracture pressure, fan 600/fan 300, Gel 10min/Gel 10s, pump pressure, and
rpm. The goal variable was the severity of mud loss. The MLP-GA model had an RMSE
of 93%, while the suggested model was accurate. Shi et al. [129] used a Hybrid-Physics
Guided-Variational Bayesian Spatial-Temporal Neural Network to analyze natural gas
across time. The study aimed to forecast natural gas concentrations using a dataset of
600 samples. The predictor variables were geometry size, release point position, release
diameter, released gas, volumetric release rate, duration, and sensor placement. The
R2 value was used as an evaluation metric, and the Hybrid-Physics Guided-Variational
Bayesian Spatial-Temporal Neural Network received a score of R2 is 99% It can be concluded
that the findings imply the Hybrid-Physics Guided-Variational Bayesian Spatial-Temporal
Neural Network enhanced the spatiotemporal forecasting performance.
Furthermore, the temporal analysis focused on well data, specifically within the
context of 3W wells by Machado et al. [130]. The research involved the application of LSTM
and One-Class Support Vector Machine (OCSVM) models for classification, utilizing a
dataset comprising 1984 samples. The classification task aimed to identify the following
types of faults: P-PDG, P-TPT, T-TPT, P-MON-CKP, and T-JUS-CKP. The evaluation metrics
included recall, specificity, and accuracy, with the OCSVM model achieving an accuracy
of 91%. The study found that feature selection did not improve classifier accuracy, and
the proposed model demonstrated a lack of robustness in effectively classifying the two
types of faults in the well data. The temporal analysis of the research by Carvalho et al. [10]
focused on well data, specifically 3W wells. The study used ML models such as Ordered
Nearest Neighbors, Weighted Nearest Neighbors, LDA, and QDA to perform a classification
job with 1984 samples. The classification sought to forecast flow instability by detecting
events like P-PDG, P-TPT, T-TPT, P-MON-CKP, T-JUS-CKP, and CLASS. The evaluation
measures included recall, specificity, and accuracy, with the ONN reaching an accuracy
of 81%. However, the study’s author recommended looking into different metaheuristic
methodologies, indicating a possibility for better performance in forecasting flow instability
from the well data.
In the study by Zhou et al. [131], the analysis in the reservoir domain was conducted
with DT and SVM models on high-resolution non-temporal Formation Micro-Imager (FMI)
data. The classification task aimed to categorize how logging units react to sedimentary
pyroclastic rock, regular pyroclastic rock, and pyroclastic lava for lithologically classifying
pyroclastic rocks. The SVM’s model had an impressive accuracy of 98.6%, surpassing
the threshold of 95%. The study emphasized the efficacy of the suggested model in
Sensors 2024, 24, 4013 35 of 57

lithologic classification by highlighting its significantly superior performance. In Zhang


et al.’s [132] study, which involved a temporal analysis in the pipeline domain, CNN,
SVM, and SVM+CNN models were applied to a leakage dataset containing 1000 samples.
The prediction task focused on length, outer diameter, wall thickness, and location in the
model to predict leakage in tight sandstone reservoirs. The SVMCNN model achieved a
high accuracy of 95.5%, outperforming other methods. This highlights the advantages of
the suggested methodology over other methods for anticipating leaks in tight sandstone
reservoirs. Collectively, these studies highlight the application of alternative ML models,
specifically SVM and MLP, in addressing various predictive analytics challenges in the
O&G industry. The selection of the model depends on the nature of the data and specific
predictive task at hand, showcasing the versatility and effectiveness of these models in
optimizing predictions for different parameters and scenarios.
Zuo et al. [133] addressed natural gas leakage in SCADA data using a network and
OCSVM hybrid with a few other ML models, including Basic Autoencoder (BAE), Con-
volutional Autoencoder (CAE), LSTM with Autoencoder (AE), RF, PCA, Variational Au-
toencoders (VAE), and LSTM-AE- isolation forest (IF), with 9980 samples of input data, to
demonstrate the efficiency of DL models for managing complicated and time-varying gas
data to ensure precise categorization. The proposed model, i.e., LSTM- AE-OCSVM, had
a greater accuracy of 98%, and the researcher proposed using anomalous data in future
studies. Meanwhile, Martinez and Rocha [67] focused on reservoirs and used 3,257 samples
from the Volve and UNISIM-IIH oil fields to examine LSTM and GRU models. With an
impressive R2 of 99%, the GRU model demonstrated its superiority in O&G forecasting
when classifying oil, gas, water, or pressure. Within the field of reservoir clustering, Chen
et al. [134] applied K-Means Clustering and KNN models to a range of shale reservoirs,
including Antrim, Barnett, Eager Ford, Woodford, Fayetteville, Haynesville, and Marcellus.
With 55,623 samples involving well location, depth, length, and production starting year,
the K-MC model outperformed the alternative models, with an R2 of 0.18. To classify wells
using the 3W oil well dataset, Fernandes et al. [135] investigated models like OCSVM, LOF,
Elliptical Envelope, and AE using feedforward and LSTM. The LOF model showed an F1
score of 85%, with an emphasis on fault identification utilizing parameters like P-PDG and
T-JUS-CKGL. Although deemed acceptable, the accuracy of the suggested approach can be
increased.
In the domain of non-temporal well analysis in the oil fields in the Middle East,
Gao et al. [136] utilized the group method of data handling (GS-GMDH) models with
2748 samples. The researchers predicted pore pressure based on various parameters such
as gamma ray (spectral) (SGR), density (RHOB), gamma ray (corrected) (CGR), and sonic
transit time (DT). The GS-GMDH model exhibited an RMSE of 1.88 psi and an R2 of 0.9997,
showcasing higher accuracy. Using geological data from 180 samples, Cirac et al. [137]
investigated a few models, including RF, Gradient Boosting Regressor, Bagging, CNN,
KNN, and Deep Hierarchical Decomposition models, in their investigation of temporal
reservoir analysis. They aimed to classify a variety of parameters, including porosity,
fracture porosity, fracture permeability, rock type, net gross, matrix permeability, water
relative permeability, formation volume factor, rock compressibility, pressure dependence
of water viscosity, gas density, water density, vertical continuity, relative permeability
curves, oil–water contact, and fluid viscosity. The Deep Hierarchical Decomposition
model decreased computing speed, with an MAE for oil production of 0.76%. Within
the framework of gas analysis, Dayev et al. [138] employed the M5P tree model and RF,
Random Tree, Reduced Error Pruning Tree (REPT), GPR, SVM, and Multivariate Adaptive
Regression Spline (MARS) models with 201 samples from a Coriolis flow meter. They aimed
to classify wet gas flow rate (kg/h) and absolute gas humidity (g/m3 ) for the estimation of
dry gas flow rate (kg/h). The GPR-RBKF model outperformed other models, with an MAE
of 163.3266 kg/h and an RMSE of 483.1359 kg/h. Table 7 summarizes previous works on
the application of ML models for predictive analytics modeling in O&G fields.
Sensors 2024, 24, 4013 36 of 57

Table 7. Previous works on the application of ML models for predictive analytics modeling in O&G fields.

Output Performance Advantages/


Reference Models Temporality Field Dataset Class Input Parameter Best Model
Parameter Metrics Disadvantages
Multivariate
Empirical Mode Bubble point Bubble point The hybrid
Decomposition pressure and oil Temperature (T), oil gravity pressure and oil proposed model
[124] with Genetic Non-temporal Crude oils formation Clustering (API), gas specific gravity (γg), formation RMSE MELM-PSO outperformed
Algorithm, volume factor and ratio of gas oil solution volume factor of the empirical
LSSVM-GA, and 638 samples crude oils method.
LSSVM-PSO
The proposed
Pore size remained the same. The model needs
Real-time oil
PCA, SVM, and capillary flow rate (l2/t) was a SVM improvement in
[126] Temporal Oil samples Classification Oil types Accuracy
LDA function of interfacial properties Accuracy = 90% accuracy
30 samples
(γLG and θ) and viscosity (µ). because the
accuracy < 95%.
The proposed
Well depth, compressional wave model
Three wellbores Probable depth
MLP-PSO and velocity (Vp), shear wave outperformed
[127] Non-temporal Well log drilled Prediction of casing R2 and RMSE MLP-PSO
MLP-GA velocity (Vs), bulk density (ρ), the other
22,323 samples collapse
and pressure pore (Pp), models’
accuracy.
LSSVM-COA, Northing, easting, depth,
LSSVM-PSO, meterage, formation type, hole
LSSVM-GA, 305 drilled wells size, WOB, flow rate, MW, The accuracy of
MLP-COA, in the Marun oil MFVIS, retort solid, pore Severity of mud MLP-GA the proposed
[128] Non-temporal Drilling Prediction R2 and RMSE
MLP-PSO, field pressure, drilling time, fracture loss RMSE = 93% model can be
MLP-GA, 2820 samples pressure, fan 600/fan 300, improved.
LSSVM, and gel10min/gel10s, pump pressure,
MLP and RPM
Hybrid-Physics The proposed
Size of geometry, release point
Guided- integration
position, release diameter,
Variational Natural gas Natural gas Hybrid_PG_VBSTnnenhanced the
[129] Temporal Gas Prediction released gas, volumetric release R2
Bayesian Spatial- 600 samples concentration R2 = 99% spatiotemporal
rate, length of release, and sensor
Temporal neural forecasting
location
network performance.
CNN, Linear Methane, ethane, propane, The model stood
Gas pipeline SVM
SVM, Gaussian Leakage dataset isobutane, butane, helium, out for its
[125] Temporal Gas Classification leakage Accuracy Accuracy =
SVM, and 1000 samples nitrogen, hydrogen sulfide, exceptional
estimation 95.5%
SVM+CNN carbon dioxide performance.
Sensors 2024, 24, 4013 37 of 57

Table 7. Cont.

Output Performance Advantages/


Reference Models Temporality Field Dataset Class Input Parameter Best Model
Parameter Metrics Disadvantages
The use of
feature selection
did not improve
the classifier
Recall,
LSTM and 3W P-PDG, P-TPT, T-TPT, Identify two OCSVM accuracy. The
[130] Temporal Well Classification specificity, and
OCSVM 1984 samples P-MON-CKP, and T-JUS-CKP types of faults Accuracy = 91% proposed model
accuracy
was not robust
enough to
classify 2 types
of wells.
Ordered Nearest The author
Neighbors, suggested
P-PDG, P-TPT, T-TPT, Recall,
Weighted 3W Predicting flow ONN investigating
[10] Temporal Well Classification P-MON-CKP, T-JUS-CKP, and specificity, and
Nearest 1984 samples instability Accuracy = 81% another
CLASS accuracy
Neighbors, metaheuristic
LDA, and QDA method.
The SVM+CNN
Length, outer diameter, wall Prediction in SVM+CNN model
CNN, SVM, and Leakage dataset
[132] Temporal Pipeline Prediction thickness, and location in the tight sandstone Accuracy model, achieved outperformed
SVM+CNN 1000 samples
model reservoirs 95.5% the CNN and
SVM
The SVM
Response of logging, pyroclastic Lithologic SVM
High-resolution accuracy was
[131] DT and SVM Non-temporal Reservoir Classification lava, normal pyroclastic rock, classification of Accuracy Accuracy =
FMI data higher than 95%
and sedimentary pyroclastic rock pyroclastic rocks 98.6%
which is 98.6%
BAE-OCSVM,
CAE-OCSVM,
The best model
LSTM-AE-
achieved higher
OCSVM, AUC, accuracy,
Data from LSTM-AE- accuracy, and
RD-OCSVM, Diameter, wall thickness, and Leakage of F1 score,
[133] Temporal Gas SCADA Classification OCSVM the author
RF-OCSVM, length natural gas precision, TPR,
9980 samples Accuracy = 98% suggested using
PCA-OCSVM, and FPR
abnormal data
VAE-OCSVM,
for future work.
and
LSTM-AE-IF
UNISIM-IIH The proposed
and Volve Oil and gas GRU model had the
[67] LSTM and GRU Temporal Reservoirs Classification Oil, gas, water, or pressure SMAPE and R2
oilfield forecasting R2 = 99% highest
3257 samples accuracy.
Sensors 2024, 24, 4013 38 of 57

Table 7. Cont.

Output Performance Advantages/


Reference Models Temporality Field Dataset Class Input Parameter Best Model
Parameter Metrics Disadvantages
OCSVM, LOF,
Elliptical P-PDG, P-TPT, T-TPT, The proposed
Envelope, and 3W P-MON-CKP, T-JUS-CKP, LOF method needs
[135] Temporal Well Classification Fault detection F1 score
Autoencoder 1984 samples P-JUS-CKGL, T-JUS-CKGL, QGL, F1 score = 85% improvement in
withfeedfor- and Label vector accuracy.
ward+LSTM
Antrim, Barnett, The proposed
Eager Ford, model
K-Means Woodford, Well location, well depth, well outperformed
K-MC
[134] Clustering and Temporal Reservoirs Fayetteville, Clustering length, and production starting EUR predictions R2 the other
R2 = 0.18
KNN Haynesville, year models using
and Marcellus average fitting
55,623 samples parameters.
Laterolog (LLS), photoelectric
index (PEF), compressional wave
velocity (Vp), porosity (NPHI),
Oil fields
gamma ray (spectral) (SGR), GS-GMDH GS-GMDH had
located in the RMSE, R2 , MSE,
[136] GS-GMDH Non-temporal Well Prediction density (RHOB), amma ray Pore pressure RMSE = 1.88 psi the best
Middle East SI, and ENS
(corrected) (CGR), shear wave and R2 = 0.9997 accuracy.
2748 samples
velocity (Vs), caliper (CALI),
resistivity (ILD), and sonic transit
time (DT)
Porosity, fracture porosity,
fracture permeability, rocky type,
RF, Gradient net gross, matrix permeability,
Oil production,
Boosting water relative permeability, Deep The proposed
water
Regressor, formation volume factor, rock Hierarchical method
Geological data production, MAE and
[137] Bagging, CNN, Temporal Reservoir Classification compressibility, pressure Decomposition decreased the
180 samples water injection, SMAPE
KNN, and Deep dependence of water viscosity, MAE: computational
and liquid
Hierarchical gas density, water density, OP = 0.76% speed.
production
Decomposition vertical continuity, relative
permeability curves, oil–water
contact, and fluid viscosity
Sensors 2024, 24, 4013 39 of 57

Table 7. Cont.

Output Performance Advantages/


Reference Models Temporality Field Dataset Class Input Parameter Best Model
Parameter Metrics Disadvantages
The best model
GPR-RBKF
was superior to
M5P tree model, MAE =
the other
RF, Random 163.3266 kg/h,
Coriolis flow Estimation of models, and the
Tree, Reduced Wet gas flow rate (kg/h) and RMSE, MAE, RMSE =
[138] Non-temporal Gas meter Classification the dry gas flow author
Error Pruning absolute gas humidity (g/m3 ) LMI, and WI 483.1359 kg/h,
201 samples rate (kg/h) suggested
Tree, GPR, SVM, CC = 0.9915 for
exploring other
and MARS the dataset used
soft-computing
for testing
methods.
Sensors 2024, 24, 4013 40 of 57

3. Literature Review Assessment


Analyzing and evaluating the existing literature is crucial for survey research as
it provides readers with an in-depth discussion that will be helpful. Considering the
previously reported review of ML-based models for predictive analytics modeling for O&G
fields, this section summarizes and discusses numerous key points.
• Tables 1–7 provide a comprehensive overview of the reviewed papers, presenting
essential details such as the author names, applied AI model types, temporality of
the dataset, domain of the O&G model in the study, dataset sources, number of data
samples, parameters for input and output, measures for the performance employed,
best models found, and advantages or drawbacks of the performing models. The
researchers consistently focused on carefully selecting input combinations for O&G
predictive analytics modeling.
• ANN models can be expanded from binary to multiclass cases. Furthermore, the
complexity of ANN models may be easily changed by modifying model structure
and learning methods and assigning transfer functions using empirical evidence
or correlation analysis. The findings revealed that ANNs could effectively predict,
classify, or cluster O&G cases, including crater width in buried gas pipelines, corrosion
defect depth, flowing bottom-hole pressure in vertical oil wells, concentrations of
gas-phase pollutants for contamination removal, drilling-related occurrences based
on epochs, age, formation, lithology, and fields, as well as predicting gas routes and
chimneys in drilling activities and DGA datasets. ANNs may be compared to various
models, like the SARIMA and QDA.
• Reviewed articles from 2021 to 2023: RF has become much more popular in the
predictive analytics for O&G than other modeling techniques, like the MLP, DT, and
LSTM, because it prevents overfitting and is more accurate in prediction. In the
O&G sector, RF appears to be a typical, flexible, and effective ML framework because
of its capacity to handle complicated O&G datasets that may be fragmented. The
O&G industry has become another field with data scarcity for modeling. In pipeline
failure risk prediction and transformer fault classification, RF is included in model
ensembles to help achieve good results. Its use in drilling, well data analysis, lithology
identification, crude oil data analysis, and burst pressure prediction demonstrates RF’s
robust application performance. RF stands out for its dependability, obtaining excellent
accuracy, precision, and recall values in many applications within the O&G area,
emphasizing its applicability for multiple data formats such as binary or multiclass
cases.
• The O&G industry has seen a rise in the use of DL, an effective subset of ML, espe-
cially for predicting the lifespan of equipment and modeling groundwater levels. DL
frameworks, especially the CNN and LSTM, outperform other models in prediction
accuracy. Industry uses of DL include assessing algorithm performance, integrating
data into DL algorithms, and developing simulation frameworks. Significant studies
demonstrate DL’s efficacy in estimating oil output and pressure in wells, identifying
pipeline fractures, and producing hydrocarbons in the gas sector. The evaluations
of hybrid models, such as DCNN+LSTM and LSTM+Seq2Seq, show outstanding
accuracy, indicating DL’s potential for optimizing operations and decision-making
processes in the O&G field. The hybrid model is more efficient due to feature extraction
and the capacity to learn patterns in extended data sequences.
• AI models are widely employed in the O&G sector to deliver predictive analytics. In
non-linear modeling, SVR is a kernel-based ML method often used to translate data to
a higher-dimensional space. This makes it an effective tool for regression problems
with complicated input and interaction of target variables. MLR is still an excellent
approach for examining dependencies since it is a powerful tool for analyzing the
connection between dependent and several independent variables. Non-temporal
gas well data are analyzed using MLR, SVR, and GPR models because they provide a
good blend of interpretability, simplicity, performance, and adaptability. However, the
Sensors 2024, 24, 4013 41 of 57

decision between these models is ultimately determined by the dataset’s particular


properties and the problem’s needs. The other research focused on the temporal
prediction of corrosion in pipes using several AI models, with the RNN showing
promising results. Non-temporal O&G production categorization, reservoir data
analysis, and transformer fault prediction were all explored using various AI models,
demonstrating industry flexibility.
• The O&G sector replicates real-world system behavior with mathematical models,
namely regression and time series analysis. Statistical models such as the SARIMA, AR,
and ARIMA are more accurate since they account for temporal relationships. Research
has validated the efficacy of the SARIMA in forecasting DGA gas concentrations
in transformers, highlighting its ability to capture seasonal fluctuations based on
each temporal data point. These techniques forecast shale gas output, producing a
satisfactory mean outcome. It has been proven that statistical approaches are adaptable
to dealing with temporal dependencies and forecasting concerns in the O&G area.
• The limited sample size of the dataset utilized in earlier research on predictive analytics
in O&G industries is a key limitation that can have a major impact on the results’
generalizability and dependability. It is challenging to obtain reliable results from small
sample numbers since they frequently result in more variability and fewer accurate
estimations. This limitation may also lead to a loss of statistical power, which lowers
the capacity to identify important variations or connections in the data. Additionally,
there is a higher chance that a smaller sample size of data may not accurately reflect
the larger population, which could introduce bias and restrict the findings’ application
to other groups. Therefore, to maintain robustness and accuracy, researchers need to
take precautions when interpreting studies based on limited datasets and think about
confirming their findings using larger and more varied sample sizes.
• A few input parameters were used to detect defects in wells utilizing various sensors
in predictive analytics including classified, clustered, and forecasted. Because of the
data’s accessibility and availability, researchers regularly employ P-PDG, P-PDG, P-
TPT, T-TPT, and P-MON-CKP (five parameters) as input parameters. Data limitations
are widespread due to the difficulty of digging wells in severe environments such as
the deep sea. However, there are two types of models implemented RF model in the
previous study. Between RF model used 15 input parameters and the RF model used
five parameters then the performance results of those two models are compared. The
outcomes of employing the 15 input parameters with the DT model were superior to
the five input parameter models. Table 8 outlines the input parameters utilized by the
researchers in their research papers.
• Detecting internal transformer failures is another O&G-related topic that has been
the subject of several previous studies. Specifically, a few gas compositions were
used as input variables, including acetylene (C2 H2 ), ethylene (C2 H4 ), ethane (C2 H6 ),
methane (CH4 ), and hydrogen (H2 ), which were mainly applied across the studies
because of the high correlation between the input variables and the target variables in
detecting the fault in the transformer. However, the detection of other parameters such
as total hydrocarbon (TH), carbon monoxide (CO), carbon dioxide (CO2 ), ammonia
(NH3 ), acetaldehyde (CH3 CHO), acetone (CH32 CO), toluene (C6 H5 CH3 ), oxygen (O2 ),
nitrogen (N2 ), and ethanol (CH3 CH2 OH) varied between studies. These parameters
were chosen because of the weak correlation ranking between the input and target
variables; so, not all the studies implemented the gas compositions mentioned earlier.
A few input variables, including C2 H2 , C2 H4 , C2 H6 , CH4 , and H2 (five variables), were
included in the study article’s model comparison. The results showed that models
like KNN, QDA, and LGBM had accuracies of 88%, 99.29%, and 87.06%, respectively.
In contrast, the accuracies of the MTGNN, KNN+SMOTE, and RF, with accuracies of
92%, 98%, and 96.2%, respectively, were obtained when the models employed C2 H2 ,
C2 H4 , C2 H6 , CH4 , H2 , TH, CO, CO2 , NH3 , CH3 CHO, CH32 CO, C6 H5 CH3 , O2 , N2 ,
and CH3 CH2 OH (15 variables) in their research. As can be observed from the average
Sensors 2024, 24, 4013 42 of 57

accuracies, the use of 15 variables produces superior outcomes than the five variable
models. Previous research publications may be found in Table 9.
• Table 10 summarizes the input parameters for a well logging predictive analytics
model. The researchers commonly used 14 parameters for well logging, including
gamma ray (GR), sonic (Vp), deep and shallow resistivities (LLD and LLS), neuro-
porosity (NPHI), density (RHOB), caliper (CALI), neutron (NEU), sonic transit time
(DT), bulk density (DEN), deep resistivity (RD), true resistivity (RT), shallow resistivity
(RES SLW), total porosity (PHIT), and water saturation (SW). The correlation coeffi-
cient between the input parameters and the target variables is essential to determine
which parameters are appropriate for predictive analytics and the data type, which
may be numerical or categorical. Thus, a few important variables can be chosen to con-
struct the best model for increased accuracy. However, the model using 14 variables
produced a substantial result of 97% by including XGBoost in their research, but the
study that only utilized GR, Vp, LLD and LLS, NPHI, and RHOB and used the LSTM
model achieved a slightly lower result of 94%. These three well-known datasets, which
have been utilized in recent research in the O&G sector, demonstrate the importance
of determining the correlation between target and input parameters to compare which
variables are appropriate for models to provide significant outcomes in the research.
• The assessment of O&G research revealed an increase in published papers over time.
As seen in Figure 2, the rise in O&G discoveries due to the dependence of technological
advancements on the usage of gas and petroleum, as well as the annual progress of ML
and AI tools, has resulted in more studies in this field utilizing AI-based models. As
shown in Figure 2, there was an increase in growth throughout 2021, with 32 research
publications published in this field. However, the number of articles released in 2022
decreased by seven, with just 25 published research papers. This reduction can be
attributed to the continued development of AI and the gradual progression of interest
in O&G research. It exhibits a positive trend, with 34 articles published in this field by
2023. This increase may be impacted by recognizing the necessity for improvement in
the AI-based model in the O&G area. Many O&G companies have followed the IR4.0
road to integrate AI in their organization and reduce the likelihood of future expense
utilization by forecasting future events.
• Throughout the research period, developments in AI models resulted in more com-
plicated and interconnected models, giving researchers tools to construct more exact
and resilient models. A similar finding was reached while investigating the use of
various models in predictive analytics in the O&G industry during the last three
years. Figure 4a depicts a thorough breakdown of the most common model types
used for predictive analytics in the O&G industry, illustrated by a pie chart. The chart
shows that the most widely used models, there is 37% out of all models are classified
as “others”, which primarily include foundational models such as SVR, GRU, MLP,
and boosting-based models (shown in Figure 4b). Due to their improved efficiency,
accuracy, and capacity to handle non-linear datasets, these models have become quite
popular. This selection of models shows that there is still a lot of remaining potential
in this field.
• The analysis of predictive analytics research publications from 2021 to 2023 focuses
heavily on several areas of the O&G sector. Crude oils (7), oil (5), reservoirs (16),
pipelines (16), drilling (5), wells (20), transformers (10), gas (10), and lithology (2)
all appear as similar subjects in different research. The frequency of these terms
demonstrates the industry’s strong interest in using predictive analytics to optimize
operations and decision-making in various sectors, including reservoir management,
drilling procedures, pipeline integrity, and transformer health. This trend represents
a deliberate effort in the O&G industry to use sophisticated analytics for greater effi-
ciency, risk management, and overall operational excellence. Figure 5 is the graphical
summary of the types of O&G sectors in research articles.
Sensors 2024, 24, 4013 43 of 57

• Several performance measures have been utilized in O&G research, demonstrating


diverse assessment criteria for predictive analytics models (see Figure 6). The perfor-
mance metrics help understand the models’ performance since they might show many
model characteristics. Figure 6a, which shows the various performance measures used
in the research, demonstrates that accuracy (49) was the most preferred for calculating
the correctly predicted value versus the actual one. This performance measure is
appropriate for categorical data types and classification predictive analysis because
it is simple to grasp and indicates whether all the classes are balanced. However,
utilizing accuracy for unbalanced classes has limitations since it can be deceptive;
alternative measures like precision, recall, F1 score, or AUC may be more helpful.
Aside from that, the researchers’ second chosen performance indicator in their research
is R2 (41). This performance indicator is commonly employed in regression analysis
and numerical data since it measures the relationship between the independent and
dependent variables.
• Furthermore, R2 is simple to read because it ranges from 0 to 1, with closer results to 1
indicating perfect variability between independent and dependent variables. However,
there is a disadvantage to using only R2 to demonstrate how effectively the model
reacts. One of the disadvantages is that it is vulnerable to outliers; even a single outlier
might alter the results. Figure 6b is an expansion of the “others” section that depicts
the additional performance indicators used in the previous studies.
• Based on the data presented in Table 11, a thorough analysis of model performance for
diverse applications identifies numerous key performers across multiple categories. In
the field of ANNs, significant high performers include ANN models with accuracies of
99.6% and ANNs integrated with PSO (ANN+PSO) with 99% accuracy. This suggests
that adding optimization techniques such as PSO can considerably improve ANN
performance. DL models also perform well, with DCNN+LSTM obtaining 99.37%
accuracy and GRU models reaching 99% accuracy. These studies demonstrate the
effectiveness of DL systems, particularly in managing complicated data patterns.
• Within the class of Fuzzy Logic and Neuro-fuzzy models, every variation—LSSVM+CSA,
ANFIS+PCA, and Control Chart+RF—achieves 99% accuracy on average. This consis-
tency emphasizes the dependability of Fuzzy Logic systems in certain applications. DT,
RF, and hybrid models exhibit considerable variability, with top performers such as DT
and CATBOOST reaching 99.9% accuracy. However, the high number of models with
much lower accuracies indicates a considerable sensitivity to certain data properties
and model settings.
• Interrelated AI models, particularly the SVR combined with the Genetic Algorithm and
Particle Swarm Optimization (SVR+GA+PSO), outperform others with 99% accuracy,
demonstrating the potential of hybrid approaches to increase prediction accuracy. The
ARIMA is the most accurate statistical models in the research, with a performance of
63%. However, it has limitations when dealing with complex datasets compared to
advanced AI models.
• Finally, in predictive analytics for the O&G domain, the Hybrid-Physics Guided-
Variational Bayesian Spatial-Temporal Neural Network and GRU models approach
99% accuracy, demonstrating the usefulness of merging domain-specific knowledge
with sophisticated neural network designs. ANN and DL models perform well in a
variety of situations, but using hybrid approaches and optimization techniques can
improve their accuracy even more. However, the difference in performance across
DT and RF models indicates that careful model selection and tuning are necessary to
achieve optimal outcomes.
• The study indicates various patterns in model performance. ANNs have few outliers
of the model’s performance but show excellent accuracy for the MLP, for example,
has 10% accuracy. While there is significant volatility in the model’s performance,
DL models consistently perform well, as seen by Faster R-CNN+ClusterRPN’s 71%
accuracy. Fuzzy Logic models provide particularly consistent high performance. DT
Sensors 2024, 24, 4013 44 of 57

and RF models are very variable, with some obtaining outstanding accuracy and others
doing poorly. Interrelated AI models have consistently obtained excellent accuracy.
Statistical models, such as the ARIMA, perform poorly compared to other categories,
showing their limits with complicated datasets. Predictive analytics models normally
perform well. Yet, there is a significant outlier in predictive analytics modeling. For
example, K+MC with 18% accuracy.
• Performance levels differ among model categories, as shown in Figure 7. ANN models
perform well on average, with an accuracy of 89.23%, but performance can vary
greatly depending on specific variations and modifications, as shown by several
outliers. DL models perform well, with an average accuracy of 93.73%, demonstrating
less variability and solid outcomes across diverse versions. Fuzzy Logic and Neuro-
fuzzy models stand out for their excellent and constant performance, with an average
accuracy of 99%, making them extremely trustworthy for their applications. DT,
RF, and hybrid models exhibit great variability; although models like CATBOOST
and DT attain excellent accuracy, others, such as RF+Analog-to-digital converters,
perform poorly. Interrelated AI models perform consistently well, with an average
accuracy of 97.67%. In comparison, the ARIMA model from the statistical model
category performs inadequately, with 63% accuracy, demonstrating limits in dealing
with complex information. Models used for predictive analytics in the O&G field
typically perform well, although there are a few distinct instances. Overall, while
the most advanced AI models perform well, the diversity in particular categories
emphasize the significance of model selection and modification for the best outcomes.

Table 8. Input parameters of undesirable well events from 3W datasets.

Input Parameter of
[86] [99] [22] [73] [130] [87] [88] [10] [85] [135]
Undesirable Well Events
√ √ √ √ √ √ √ √ √ √
P-PDG
√ √ √ √ √ √ √ √ √
P-TPT
√ √ √ √ √ √ √ √ √
T-TPT
√ √ √ √ √ √ √ √
P-MON-CKP
√ √ √ √ √ √ √
T-JUS-CKP
√ √ √
T-JUS-CKGL
√ √ √
P-JUS-CKGL

P-CKGL
√ √ √ √
QGL

T-PDG
√ √
T-PCK

Table 9. Input parameters for the fault detection of transformer oil from the DGA dataset.

Input Parameter of
Internal Transformer [35] [122] [40] [83] [20] [98] [59] [139] [65] [111]
Defects
√ √ √ √ √ √ √ √
Acetylene (C2 H2 )
√ √ √ √ √ √ √ √ √
Ethylene (C2 H4 )
√ √ √ √ √ √ √ √ √
Ethane (C2 H6 )
√ √ √ √ √ √ √ √ √
Methane (CH4 )
√ √ √ √ √ √ √ √
Hydrogen (H2 )
Sensors 2024, 24, 4013 45 of 57

Table 9. Cont.

Input Parameter of
Internal Transformer [35] [122] [40] [83] [20] [98] [59] [139] [65] [111]
Defects

Total Hydrocarbon (TH)
√ √ √ √ √
Carbon Monoxide (CO)
√ √ √ √ √
Carbon Dioxide (CO2 )

Ammonia (NH3 )

Acetaldehyde (CH3 CHO)

Acetone (CH32 CO)

Nitrogen
Sensors 2024, (N
24,2x) FOR PEER REVIEW 47 of 60

Ethanol (CH3 CH2 OH)

(a)

(b)
Figure
Figure 4.4. Preferred
PreferredAIAImodel
modeltypes
typesin
inthe
theresearch
research articles
articlesabout
about predictive
predictiveanalytics
analyticsin
in the
the O&G
O&G
field: (a) overview of the AI models used in the publications and (b) extended “others” section.
field: (a) overview of the AI models used in the publications and (b) extended “others” section.

• The analysis of predictive analytics research publications from 2021 to 2023 focuses
heavily on several areas of the O&G sector. Crude oils (7), oil (5), reservoirs (16),
pipelines (16), drilling (5), wells (20), transformers (10), gas (10), and lithology (2) all
appear as similar subjects in different research. The frequency of these terms demon-
strates the industry’s strong interest in using predictive analytics to optimize opera-
tions and decision-making in various sectors, including reservoir management, drill-
ing procedures, pipeline integrity, and transformer health. This trend represents a
deliberate effort in the O&G industry to use sophisticated analytics for greater effi-
Sensors 2024, 24, 4013 46 of 57

Table 10. Input parameters of well logging.

Input Parameter of Well Logging [64] [106] [104] [140] [100] [108]
√ √ √ √ √ √
Gamma Ray (GR)
√ √
Sonic (Vp)
Deep and Shallow Resistivities (LLD √ √
and LLS)
Sensors 2024, 24, x FOR PEER REVIEW √ 48 of 60 √
Neuro-porosity (NPHI)
√ √ √ √
Density (RHOB)
√ √ √
Caliper (CALI)
Type of O&G Sectors In Research Articles (2021–2023)
√ √ √
Neutron (NEU)
25 √ √ √ √
Sonic Transit Time (DT) 20
20 16 16 √ √
Frequency

Bulk Density (DEN)


15 √
Deep Resistivity (RD) 10 10
10 7 √
True
5
Resistivity5(RT) 5
2
√ √
Shallow
0 Resistivity (RES SLW)

Total Porosity (PHIT)

Water Saturation (SW)

Sensors 2024, 24, x FOR PEER REVIEW Compressional Slowness (DTC) 48 of 60
Types of Oil and Gas Sectors √
Depth

Figure 5. Types of O&G sectors in research articles from 2021 to 2023.


Type of O&G Sectors In Research Articles (2021–2023)
• Several performance measures have been utilized in O&G research, demonstrating
25
diverse assessment criteria for predictive analytics20 models (see Figure 6). The perfor-
20 metrics help understand
mance 16 the
16 models’ performance since they might show
Frequency

many15 model characteristics. Figure 6a, which shows the various performance
10 10
measures
10 used
7 in the research, demonstrates that accuracy (49) was the most pre-
5 the correctly predicted
ferred for calculating 5 value versus the actual one. This perfor-
5 2
mance measure is appropriate for categorical data types and classification predictive
0
analysis because it is simple to grasp and indicates whether all the classes are bal-
anced. However, utilizing accuracy for unbalanced classes has limitations since it can
be deceptive; alternative measures like precision, recall, F1 score, or AUC may be
more helpful. Aside from that, the researchers’ second chosen performance indicator
in their research is R2 (41). This performance indicator is commonly employed in re-
gression analysis and numerical Types of Oil
data and
since Gas Sectors
it measures the relationship between the
independent and dependent variables.
Figure 5. Types of O&G sectors in research articles from 2021 to 2023.
Figure 5. Types of O&G sectors in research articles from 2021 to 2023.

• Preferred
Several Performance
performance measuresMetrics
have been utilized in O&G research, demonstrating
by Researcher
diverse assessment criteria for predictive analytics models (see Figure 6). The perfor-
mance metrics help understand the models’ performance since they might show
60
many 49 model characteristics. Figure 6a, which shows the various performance
50
measures used in the research, demonstrates that accuracy (49) was the most pre-
41 40
ferred for calculating the correctly predicted value versus the actual one. This perfor-
Frequency

40
mance measure 32is appropriate for categorical data types and classification predictive
28 27
30
analysis because 26 26
it is simple to24grasp and26
indicates whether all the classes are bal-
anced. However, 18
utilizing accuracy for unbalanced classes has limitations since it can
20
be deceptive; alternative measures like precision, recall, F1 score, or AUC may be
10 helpful. Aside from that, the3 researchers’
more 2 3 second chosen performance indicator
in their
0 research is R2 (41). This performance indicator is commonly employed in re-
gression analysis and numerical data since it measures the relationship between the
independent and dependent variables.

Performance Metrices
Preferred Performance Metrics
by Researcher
(a) (b)
60
Figure 6. Preferred performance metrics by the researcher: (a) combination of performance metrics
Figure 6. 49 Preferred performance metrics by the researcher: (a) combination of performance metrics
used in publications. (b) All additional performance metrics displayed
50
used in publications.
41 40 (b) All additional performance metrics displayed.
Frequency

40
32
26 26 28 27 26
30 24
18
20

10 3 2 3
0
Sensors 2024, 24, 4013 47 of 57

Table 11. A summary of each ML method’s accuracy for predictive analytics in the O&G industry
from previous studies.

ML Methods Model Variants Model Performance (%)


LWQPSO-ANN 95
ANN 93
ANN 99.6
ANN 90
DNN 146
ANN+PSO 99
Artificial Neural Network ANN 97
MTGNN 92
Multilayer Perceptron
89
Backpropagation
GA backpropagation neural
97
network
MLP 10
DE+ELM 49.7
DCNN+LSTM 99.37
LSTM 94
KNN+SMOTE 98
Deep Learning
DL 99
GRU 99
Faster R-CNN+ClutserRPN 71
LSSVM+CSA 99
Fuzzy Logic and Neuro-fuzzy ANFIS+PCA 99
Control Chart+RF 99
XGBOOST 85
XGBOOST 96
EL 84
QDA 99.29
DT 62.9
RF 99.6
DT 97
DT 99.9
Decision Tree, Random Forest,
XGBOOST 62
and Hybrid
CATBOOST 99.9
KNN 88
CATBOST 99
DF+K-MEANS 90
GSK+XGBOOST 50
LGBM 87.06
RF 91
RF 94
RF 95
Sensors 2024, 24, 4013 48 of 57

Table 11. Cont.

ML Methods Model Variants Model Performance (%)


RF+PCA 90
K-MEANS+RF 98
RF 17.49
RF+Analog-to-digital
9
converters
RF 96
Decision Tree, Random Forest, XGBOOST 92
and Hybrid
XGBOOST 97
Recursive feature elimina-
83
tion+PSO+ADABOOST
Adaptove+RF 73
RF 90
BayesOpt+XGBOOST 93
RF 80.06
RNN 98
Interrelated AI ARN 96
SVR+GA+PSO 99
Statistical model ARIMA 63
SVM 90
MLP+GA 93
Hybrid-Physics
Guided-Variational Bayesian
99
Spatial-Temporal Neural
Network
SVM 95.5
ML model utilized for OCSVM 91
predictive analytics in the
O&G field ONN 81
SVMCNN 95.5
AVM 98.6
LSTM+AE+OCSVM 98
GRU 99
LOF 85
K+MC 18
Deep Hierarchical
76
Decomposition
Sensors 2024,24,
Sensors2024, 24,4013
x FOR PEER REVIEW 52 of
49 of5760

Average Model Performance in O&G Industry


100.00%
90.00%
80.00%
70.00%

Percentage (%)
60.00%
50.00%
40.00%
30.00%
20.00%
10.00%
0.00%

Application Of Model On Predictive Analytics in O&G

Figure7.7.Average
Figure Averageaccuracy
accuracyofofML
MLmodels
modelsininthe
theO&G
O&Gindustry.
industry.

4.4.Future
FutureResearch
ResearchDirections
Directions
As
Aspredictive
predictiveanalytics
analyticsininthe
theO&G
O&Gindustry
industrycontinues
continuestotoevolve,
evolve,several
severalavenues
avenuesfor
for
future
futureresearch
researchand
anddevelopment
developmentemerge.emerge.First,
First,exploring
exploringthe
theintegration
integrationofofadvanced
advanced
Deep
DeepLearning
Learningtechniques,
techniques,such suchasasRNN
RNNand andLSTM
LSTMnetworks,
networks,could
couldenhance
enhancethe thetemporal
temporal
predictive capabilities of existing models. These architectures are adept at
predictive capabilities of existing models. These architectures are adept at capturing capturing sequen-
se-
tial dependencies and time series patterns, which could prove invaluable
quential dependencies and time series patterns, which could prove invaluable for fore- for forecasting
dynamic aspects like
casting dynamic O&Glike
aspects production rates or pipeline
O&G production conditions.
rates or pipeline Second, Second,
conditions. investigating
inves-
explainability and interpretability
tigating explainability in complexinmodels,
and interpretability complex such as ensemble
models, such astechniques
ensemble and tech-
Deep
niques Learning
and Deepnetworks,
Learning continues to be
networks, an important
continues to be area of research.
an important Developing
area of research.meth-
De-
ods to elucidate the decision-making processes of these models can enhance
veloping methods to elucidate the decision-making processes of these models can enhance the trust and
acceptance
the trust and of predictive
acceptance analytics in decision
of predictive support
analytics systems support
in decision within the O&G domain.
systems within the
Furthermore,
O&G domain. there is potential for extending research into the optimization of hybrid
models, focusing on refining parameter-tuning strategies and evaluating the
Furthermore, there is potential for extending research into the optimization of hybrid robustness
ofmodels,
these approaches
focusing onacross diverse
refining datasets and strategies
parameter-tuning scenarios. and
For evaluating
instance, understanding
the robustness
how QPSO or FDGGM parameters impact model performance could lead
of these approaches across diverse datasets and scenarios. For instance, understanding to more effective
and
howefficient
QPSO or hybrid
FDGGM predictive systems.
parameters Additionally,
impact exploringcould
model performance predictive
lead to analytics for
more effec-
emerging challenges in the industry, such as sustainability, environmental
tive and efficient hybrid predictive systems. Additionally, exploring predictive analytics impact, and
safety, could open new avenues for research. Predicting the environmental consequences
for emerging challenges in the industry, such as sustainability, environmental impact, and
of O&G activities or developing models for proactive safety monitoring could contribute
safety, could open new avenues for research. Predicting the environmental consequences
significantly to the industry’s responsible and sustainable practices.
of O&G activities or developing models for proactive safety monitoring could contribute
Finally, comprehensive benchmarking studies are needed to compare the performance
significantly to the industry’s responsible and sustainable practices.
of various predictive models under many circumstances and datasets. This could facilitate
Finally, comprehensive benchmarking studies are needed to compare the perfor-
the identification of the most suitable models for specific applications within the O&G sector,
mance of various predictive models under many circumstances and datasets. This could
providing practitioners with insightful information for making decisions. In conclusion,
facilitate the identification of the most suitable models for specific applications within the
future research in predictive analytics for the O&G industry should delve into advanced
O&G sector, providing practitioners with insightful information for making decisions. In
Deep Learning architectures, enhance model interpretability, optimize hybrid approaches,
conclusion, future research in predictive analytics for the O&G industry should delve into
advanced Deep Learning architectures, enhance model interpretability, optimize hybrid
Sensors 2024, 24, 4013 50 of 57

address emerging challenges, and conduct systematic benchmarking studies to advance


the state-of-the-art methods in this critical domain.

5. Conclusions
This review aimed to provide a thorough overview of the utilization of ML models in
simulating predictive analytics within the O&G sectors. From 2021 to 2023, we collected
data from respectable journals indexed in Web of Science, Science Direct, Scopus, and
IEEE. The analysis revealed that seven iterations of ML models had been employed in
predictive analytics modeling for the O&G industry. The survey identified key components
within existing predictive analytics models for the O&G field, encompassing Key elements
of current predictive analytics models for the oil and gas industry were identified by
the survey. These elements included model types, temporal aspects of the data and the
field, the name of the data, dataset types, predictive analytics methodologies (such as
classification, clustering, or prediction), model input and output parameters, performance
metrics, optimal models, and the advantages and disadvantages of the models. Rigorous
scientific assessments and evaluations were conducted on the surveyed studies, leading to
detailed discussions on numerous findings. This review also highlights various potential
future research directions based on the current state of the literature, providing insightful
information to interested professionals in this sector.

Author Contributions: P.A.R.A., writing—original draft preparation and visualization; M.Y., review
and editing and supervision; and M.T.M.S.-d., funding acquisition. All authors have read and agreed
to the published version of the manuscript.
Funding: This research was funded by Petronas Research Sdn. Bhd. (PRSB), grant number
20220801012.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: This study did not report any data.
Conflicts of Interest: The authors declare no conflicts of interest.

Abbreviations

Abbreviation Definition Abbreviation Definition


RF Random Forest DNN Deep Neural Network
GAM Generalized Additive Model MELM Multivariate Empirical Mode Decomposition
NN Neural Network ANFIS Adaptive Neuro-Fuzzy Inference System
Support Vector Regression with Genetic
SVR-GA SOM Self-Organizing Map
Algorithm
Support Vector Regression with Particle Swarm
SVR-PSO ANN Artificial Neural Network
Optimization
Support Vector Regression with Firefly
SVR-FFA MRGC Maximum Relevant Gain Clustering
Algorithm
GB Gradient Boosting CatBoost Categorical Boosting
Least Squares Support Vector Machine with
LSSVM-CSA MLR Multiple Linear Regression
Cuckoo Search Algorithm
AHC Agglomerative Hierarchical Clustering SVM Support Vector Machine
XGBoost Extreme Gradient Boosting FN Fuzzy Network
GPR Gaussian Process Regression LDA Linear Discriminant Analysis
LWQPSO- Linearly Weighted Quantum Particle Swarm
LSSVM Least Squares Support Vector Machine
ANN Optimization with Artificial Neural Network
PCA Principal Component Analysis DL Deep Learning
Multilayer Perceptron with Artificial Neural
MLP-ANN MLSTM Multilayer Long Short-Term Memory
Network
Sensors 2024, 24, 4013 51 of 57

Multilayer Perceptron with Particle Swarm


MLP-PSO GRU Gated Recurrent Unit
Optimization
DT Decision Tree AdaBoost Adaptive Boosting
Long Short-Term Memory Autoencoder with
LSTM Long Short-Term Memory LSTM-AE-IF
Isolation Forest
KNN k-Nearest Neighbors DNN Deep Neural Network
NB Naive Bayes CNN Convolutional Neural Network
GP Genetic Programming O&G Oil and Gas
ELM Extreme Learning Machine AI Artificial Intelligence
DF Deep Forest MSE Mean Squared Error
QDA Quadratic Discriminant Analysis MAPE Mean Absolute Percentage Error
ML Machine Learning AAPE Arithmetic Average Percentage Error
DGA Dissolved Gas Analysis SMAPE Symmetric Mean Absolute Percentage Error
RMSE Root Mean Squared Error RSE Relative Squared Error
MAE Mean Absolute Error RFR Random Forest Regression
AUC Area Under the Curve FNACC Faulty-Normal Accuracy
ARE Absolute Relative Error TPC Total Percent Correct
EVS Explained Variance Score VAF Variance Accounted For
DTR Decision Tree Regression WI Weighted Index
PLR Polynomial Linear Regression LMI Linear Mean Index
SNR Signal-to-Noise Ratio AP Average Precision
RFNACC Real Faulty-Normal Accuracy MAP Mean Average Percentage
RMSPE Root Mean Square Percentage Error ARD Absolute Relative Difference
MARE Mean Absolute Relative Error Mpa Megapascal
SI Severity Index P-JUS-CKGL Pressure Downstream of Gas Lift Choke
Pressure Downstream of Gas Lift Choke
ENS Energy Normalized Score P-CKGL
(CKGL)
MPE Mean Percentage Error QGL Gas Lift Flow Rate
Temperature at the Permanent Downhole
R Correlation of Coefficient T-PDG
Gauge Sensor
Temperature Downstream of the Production
AARD Average Absolute Relative Deviation T-PCK
Choke
P-PDG Pressure at Permanent Downhole Gauge (PDG) LSB Least Square Boosting
Pressure at Temperature/Pressure Transducer
P-TPT PLS Partial Least Squares
(TPT)
T-TPT Temperature at TPT FPM Feature Projection Model
P-MON-CKP Pressure Upstream of Production Choke (CKP) FP-DNN Feature Projection-Deep Neural Network
T-JUS-CKP Pressure Downstream of CKP GNN Graph Neural Network
T-JUS-CKGL Temperature Downstream of CKGL MLP Multilayer Perceptron
FP-PLS Feature Projection-PLS Bi-LSTM Bidirectional Long Short-Term
MGGP Multi-Gene Genetic Programming SHAP Shapley Additive Explanation
xNES Exponential Natural Evolution Strategies LR Logistic Regression
RNN Recurrent Neural Network LOF Local Outlier Factor
LGBM Light Gradient Boosting Machine ICA Imperialist Competitive Algorithm
SMOTE Synthetic Minority Oversampling Technique SFLA Shuffled Frog-Leaping Algorithm
Local Interpretable Model-Agnostic
LIME SA Simulated Annealing
Explanations
XAI Explainable Artificial Intelligence PBBLR Physics-Based Bayesian Linear Regression
GSK Gaining-Sharing Knowledge-Based Algorithm ARIMA Autoregressive Integrated Moving Average
BayesOpt-
Bayesian oOptimization XGBoost GM Generalized Method of Moments
XGBoost
PSO-Based Data Grouping Grey Model with a
FA Firefly Algorithm PSO-FDGGM
Fractional Order ccumulation
COA Cuckoo Optimization Algorithm PSOGM PSO for Grey Model
GWO Grey Wolf Optimizer LSSVM Least Square Support Vector Machine
HAS Harmony Search GA Genetic Algorithm
Sensors 2024, 24, 4013 52 of 57

BLR Bayesian Linear Regression OCSVM One-Class Support Vector Machine


Seasonal Autoregressive Integrated Moving
SARIMA BAE Basic Autoencoder
Average
GM Grey Model CAE Convolutional Autoencoder
FGM Fractional Grey Model AE Autoencoder
DGGM Data Grouping-Based Grey Modeling Method VAE Variational Autoencoder
GPR Gaussian Process Regression MARS Multivariate Adaptive Regression Spline

References
1. Liang, J.; Li, C.; Sun, K.; Zhang, S.; Wang, S.; Xiang, J.; Hu, S.; Wang, Y.; Hu, X. Activation of mixed sawdust and spirulina with
or without a pre-carbonization step: Probing roles of volatile-char interaction on evolution of pyrolytic products. Fuel Process.
Technol. 2023, 250, 107926. [CrossRef]
2. Xu, L.; Wang, Y.; Mo, L.; Tang, Y.; Wang, F.; Li, C. The research progress and prospect of data mining methods on corrosion
prediction of oil and gas pipelines. Eng. Fail. Anal. 2023, 144, 106951. [CrossRef]
3. Yusoff, M.; Ehsan, D.; Sharif, M.Y.; Sallehud-Din, M.T.M. Topology Approach for Crude Oil Price Forecasting of Particle Swarm
Optimization and Long Short-Term Memory. Int. J. Adv. Comput. Sci. Appl. 2024, 15, 524–532. [CrossRef]
4. Yusoff, M.; Sharif, M.Y.; Sallehud-Din, M.T.M. Long Term Short Memory with Particle Swarm Optimization for Crude Oil Price
Prediction. In Proceedings of the 2023 7th International Symposium on Innovative Approaches in Smart Technologies (ISAS),
Istanbul, Turkiye, 23–25 November 2023; pp. 1–4. [CrossRef]
5. Sharma, R.; Villányi, B. Evaluation of corporate requirements for smart manufacturing systems using predictive analytics. Internet
Things 2022, 19, 100554. [CrossRef]
6. Mahfuz, N.M.; Yusoff, M.; Ahmad, Z. Review of single clustering methods. IAES Int. J. Artif. Intell. 2019, 8, 221–227. [CrossRef]
7. Henrys, K. Role of Predictive Analytics in Business. SSRN Electron. J. 2021. [CrossRef]
8. Tewari, S.; Dwivedi, U.D.; Biswas, S. A novel application of ensemble methods with data resampling techniques for drill bit
selection in the oil and gas industry. Energies 2021, 14, 432. [CrossRef]
9. Allouche, I.; Zheng, Q.; Yoosef-Ghodsi, N.; Fowler, M.; Li, Y.; Adeeb, S. Enhanced predictive method for pipeline strain demand
subject to permanent ground displacements with internal pressure & temperature: A finite difference approach. J. Infrastruct.
Intell. Resil. 2023, 2, 100030. [CrossRef]
10. Carvalho, B.G.; Vargas, R.E.V.; Salgado, R.M.; Munaro, C.J.; Varejao, F.M. Flow Instability Detection in Offshore Oil Wells with
Multivariate Time Series Machine Learning Classifiers. In Proceedings of the 2021 IEEE 30th International Symposium on
Industrial Electronics (ISIE), Kyoto, Japan, 20–23 June 2021; pp. 1–6. [CrossRef]
11. Ohalete, N.C.; Aderibigbe, A.O.; Ani, E.C.; Ohenhen, P.E.; Akinoso, A. Advancements in predictive maintenance in the oil and
gas industry: A review of AI and data science applications. World J. Adv. Res. Rev. 2023, 20, 167–181. [CrossRef]
12. Tariq, Z.; Aljawad, M.S.; Hasan, A.; Murtaza, M.; Mohammed, E.; El-Husseiny, A.; Alarifi, S.A.; Mahmoud, M.; Abdulraheem, A.
A Systematic Review of Data Science and Machine Learning Applications to the Oil and Gas Industry. J. Pet. Explor. Prod. Technol.
2021, 11, 4339–4374. [CrossRef]
13. Yu, X.; Wang, J.; Hong, Q.-Q.; Teku, R.; Wang, S.-H.; Zhang, Y.-D. Transfer learning for medical images analyses: A survey.
Neurocomputing 2022, 489, 230–254. [CrossRef]
14. Barkana, B.D.; Ozkan, Y.; Badara, J.A. Analysis of working memory from EEG signals under different emotional states. Biomed.
Signal Process. Control. 2022, 71, 103249. [CrossRef]
15. Chen, W.; Huang, H.; Huang, J.; Wang, K.; Qin, H.; Wong, K.K. Deep learning-based medical image segmentation of the aorta
using XR-MSF-U-Net. Comput. Methods Programs Biomed. 2022, 225, 107073. [CrossRef] [PubMed]
16. Huang, C.; Gu, B.; Chen, Y.; Tan, X.; Feng, L. Energy return on energy, carbon, and water investment in oil and gas resource
extraction: Methods and applications to the Daqing and Shengli oilfields. Energy Policy 2019, 134, 110979. [CrossRef]
17. Hazboun, S.; Boudet, H. Chapter 8—A ‘thin green line’ of resistance? Assessing public views on oil, natural gas, and coal export
in the Pacific Northwest region of the United States and Canada. In Public Responses to Fossil Fuel Export; Boudet, H., Hazboun, S.,
Eds.; Elsevier: Amsterdam, The Netherlands, 2022; pp. 121–139.
18. Champeecharoensuk, A.; Dhakal, S.; Chollacoop, N.; Phdungsilp, A. Greenhouse gas emissions trends and drivers insights from
the domestic aviation in Thailand. Heliyon 2024, 10, e24206. [CrossRef] [PubMed]
19. Centobelli, P.; Cerchione, R.; Del Vecchio, P.; Oropallo, E.; Secundo, G. Blockchain technology for bridging trust, traceability and
transparency in circular supply chain. Inf. Manag. 2022, 59, 103508. [CrossRef]
20. Majed, H.; Al-Janabi, S.; Mahmood, S. Data Science for Genomics (GSK-XGBoost) for Prediction Six Types of Gas Based on
Intelligent Analytics. In Proceedings of the 2022 22nd International Conference on Computational Science and Its Applications
(ICCSA), Malaga, Spain, 4–7 July 2022; pp. 28–34. [CrossRef]
21. Waterworth, A.; Bradshaw, M.J. Unconventional trade-offs? National oil companies, foreign investment and oil and gas
development in Argentina and Brazil. Energy Policy 2018, 122, 7–16. [CrossRef]
Sensors 2024, 24, 4013 53 of 57

22. Marins, M.A.; Barros, B.D.; Santos, I.H.; Barrionuevo, D.C.; Vargas, R.E.; de M. Prego, T.; de Lima, A.A.; de Campos, M.L.; da
Silva, E.A.; Netto, S.L. Fault detection and classification in oil wells and production/service lines using random forest. J. Pet. Sci.
Eng. 2020, 197, 107879. [CrossRef]
23. Dhaked, D.K.; Dadhich, S.; Birla, D. Power output forecasting of solar photovoltaic plant using LSTM. Green Energy Intell. Transp.
2023, 2, 100113. [CrossRef]
24. Yan, R.; Wang, S.; Peng, C. An Artificial Intelligence Model Considering Data Imbalance for Ship Selection in Port State Control
Based on Detention Probabilities. J. Comput. Sci. 2021, 48, 101257. [CrossRef]
25. Agwu, O.E.; Okoro, E.E.; Sanni, S.E. Modelling oil and gas flow rate through chokes: A critical review of extant models. J. Pet. Sci.
Eng. 2022, 208, 109775. [CrossRef]
26. Nandhini, K.; Tamilpavai, G. Hybrid CNN-LSTM and modified wild horse herd Model-based prediction of genome sequences
for genetic disorders. Biomed. Signal Process. Control. 2022, 78, 103840. [CrossRef]
27. Balaji, S.; Karthik, S. Deep Learning Based Energy Consumption Prediction on Internet of Things Environment. Intell. Autom. Soft
Comput. 2023, 37, 727–743. [CrossRef]
28. Yang, H.; Liu, X.; Chu, X.; Xie, B.; Zhu, G.; Li, H.; Yang, J. Optimization of tight gas reservoir fracturing parameters via gradient
boosting regression modeling. Heliyon 2024, 10, e27015. [CrossRef] [PubMed]
29. de los Ángeles Sánchez Morales, M.; Anguiano, F.I.S. Data science—Time series analysis of oil & gas production in mexican fields.
Procedia Comput. Sci. 2022, 200, 21–30. [CrossRef]
30. Tan, Y.; Al-Huqail, A.A.; Chen, Q.; Majdi, H.S.; Algethami, J.S.; Ali, H.E. Analysis of groundwater pollution in a petroleum
refinery energy contributed in rock mechanics through ANFIS-AHP. Int. J. Energy Res. 2022, 46, 20928–20938. [CrossRef]
31. Wu, M.; Wang, G.; Liu, H. Research on Transformer Fault Diagnosis Based on SMOTE and Random Forest. In Proceedings of
the 2022 4th International Conference on Electrical Engineering and Control Technologies (CEECT), Shanghai, China, 16–18
December 2022; pp. 359–363. [CrossRef]
32. Dashti, Q.; Matar, S.; Abdulrazzaq, H.; Al-Shammari, N.; Franco, F.; Haryanto, E.; Zhang, M.Q.; Prakash, R.; Bolanos, N.; Ibrahim,
M.; et al. Data Analytics into Hydraulic Modelling for Better Understanding of Well/Surface Network Limits, Proactively Identify
Challenges and, Provide Solutions for Improved System Performance in the Greater Burgan Field. In Proceedings of the Abu
Dhabi International Petroleum Exhibition & Conference, Abu Dhabi, United Arab Emirates, 15–18 November 2021. [CrossRef]
33. Wang, X.; Daryapour, M.; Shahrabadi, A.; Pirasteh, S.; Razavirad, F. Artificial neural networks in predicting of the gas molecular
diffusion coefficient. Chem. Eng. Res. Des. 2023, 200, 407–418. [CrossRef]
34. Kamarudin, R.; Ang, Y.; Topare, N.; Ismail, M.; Mustafa, K.; Gunnasegaran, P.; Abdullah, M.; Mazlan, N.; Badruddin, I.; Zedan, A.;
et al. Influence of oxyhydrogen gas retrofit into two-stroke engine on emissions and exhaust gas temperature variations. Heliyon
2024, 10, e26597. [CrossRef] [PubMed]
35. Raghuraman, R.; Darvishi, A. Detecting Transformer Fault Types from Dissolved Gas Analysis Data Using Machine Learning
Techniques. In Proceedings of the 2022 IEEE 15th Dallas Circuit and System Conference (DCAS), Dallas, TX, USA, 17–19 June
2022; pp. 1–5. [CrossRef]
36. Mukherjee, T.; Burgett, T.; Ghanchi, T.; Donegan, C.; Ward, T. Predicting Gas Production Using Machine Learning Methods: A
Case Study. In Proceedings of the SEG International Exposition and Annual Meeting, San Antonio, TX, USA, 25 September 2019;
pp. 2248–2252. [CrossRef]
37. Dixit, N.; McColgan, P.; Kusler, K. Machine Learning-Based Probabilistic Lithofacies Prediction from Conventional Well Logs: A
Case from the Umiat Oil Field of Alaska. Energies 2020, 13, 4862. [CrossRef]
38. Aldosari, H.; Elfouly, R.; Ammar, R. Evaluation of Machine Learning-Based Regression Techniques for Prediction of Oil and Gas
Pipelines Defect. In Proceedings of the 2020 International Conference on Computational Science and Computational Intelligence
(CSCI), Las Vegas, NV, USA, 16–18 December 2020; pp. 1452–1456. [CrossRef]
39. Elmousalami, H.H.; Elaskary, M. Drilling stuck pipe classification and mitigation in the Gulf of Suez oil fields using artificial
intelligence. J. Pet. Explor. Prod. Technol. 2020, 10, 2055–2068. [CrossRef]
40. Taha, I.B.; Mansour, D.-E.A. Novel Power Transformer Fault Diagnosis Using Optimized Machine Learning Methods. Intell.
Autom. Soft Comput. 2021, 28, 739–752. [CrossRef]
41. Tiyasha; Tung, T.M.; Yaseen, Z.M. A survey on river water quality modelling using artificial intelligence models: 2000–2020. J.
Hydrol. 2020, 585, 124670. [CrossRef]
42. Agatonovic-Kustrin, S.; Beresford, R. Basic concepts of artificial neural network (ANN) modeling and its application in pharma-
ceutical research. J. Pharm. Biomed. Anal. 2000, 22, 717–727. [CrossRef] [PubMed]
43. Tao, H.; Hameed, M.M.; Marhoon, H.A.; Zounemat-Kermani, M.; Heddam, S.; Kim, S.; Sulaiman, S.O.; Tan, M.L.; Sa’adi, Z.; Mehr,
A.D.; et al. Groundwater level prediction using machine learning models: A comprehensive review. Neurocomputing 2022, 489,
271–308. [CrossRef]
44. Kalam, S.; Yousuf, U.; Abu-Khamsin, S.A.; Bin Waheed, U.; Khan, R.A. An ANN model to predict oil recovery from a 5-spot
waterflood of a heterogeneous reservoir. J. Pet. Sci. Eng. 2022, 210, 110012. [CrossRef]
45. Eckert, E.; Bělohlav, Z.; Vaněk, T.; Zámostný, P.; Herink, T. ANN modelling of pyrolysis utilising the characterisation of
atmospheric gas oil based on incomplete data. Chem. Eng. Sci. 2007, 62, 5021–5025. [CrossRef]
46. Qin, G.; Xia, A.; Lu, H.; Wang, Y.; Li, R.; Wang, C. A hybrid machine learning model for predicting crater width formed by
explosions of natural gas pipelines. J. Loss Prev. Process. Ind. 2023, 82, 104994. [CrossRef]
Sensors 2024, 24, 4013 54 of 57

47. Wang, Q.; Song, Y.; Zhang, X.; Dong, L.; Xi, Y.; Zeng, D.; Liu, Q.; Zhang, H.; Zhang, Z.; Yan, R.; et al. Evolution of corrosion
prediction models for oil and gas pipelines: From empirical-driven to data-driven. Eng. Fail. Anal. 2023, 146, 107097. [CrossRef]
48. Sami, N.A.; Ibrahim, D.S. Forecasting multiphase flowing bottom-hole pressure of vertical oil wells using three machine learning
techniques. Pet. Res. 2021, 6, 417–422. [CrossRef]
49. Chohan, H.Q.; Ahmad, I.; Mohammad, N.; Manca, D.; Caliskan, H. An integrated approach of artificial neural networks and
polynomial chaos expansion for prediction and analysis of yield and environmental impact of oil shale retorting process under
uncertainty. Fuel 2022, 329, 125351. [CrossRef]
50. Carvalho, G.d.A.; Minnett, P.J.; Ebecken, N.F.F.; Landau, L. Machine-Learning Classification of SAR Remotely-Sensed Sea-Surface
Petroleum Signatures—Part 1: Training and Testing Cross Validation. Remote Sens. 2022, 14, 3027. [CrossRef]
51. Li, X.; Han, W.; Shao, W.; Chen, L.; Zhao, D. Data-Driven Predictive Model for Mixed Oil Length Prediction in Long-Distance
Transportation Pipeline. In Proceedings of the 2021 IEEE 10th Data Driven Control and Learning Systems Conference (DDCLS),
Suzhou, China, 14–16 May 2021; pp. 1486–1491. [CrossRef]
52. Mendoza, J.H.; Tariq, R.; Espinosa, L.F.S.; Anguebes, F.; Bassam, A. Soft Computing Tools for Multiobjective Optimization of
Offshore Crude Oil and Gas Separation Plant for the Best Operational Condition. In Proceedings of the 2021 18th International
Conference on Electrical Engineering, Computing Science and Automatic Control (CCE), Mexico City, Mexico, 10–12 November
2021; pp. 1–6. [CrossRef]
53. Sakhaei, A.; Zamir, S.M.; Rene, E.R.; Veiga, M.C.; Kennes, C. Neural network-based performance assessment of one- and
two-liquid phase biotrickling filters for the removal of a waste-gas mixture containing methanol, α-pinene, and hydrogen sulfide.
Environ. Res. 2023, 237, 116978. [CrossRef] [PubMed]
54. Hasanzadeh, M.; Madani, M. Deterministic tools to predict gas assisted gravity drainage recovery factor. Energy Geosci. 2023, 5,
100267. [CrossRef]
55. Zhang, X.-Q.; Cheng, Q.-L.; Sun, W.; Zhao, Y.; Li, Z.-M. Research on a TOPSIS energy efficiency evaluation system for crude oil
gathering and transportation systems based on a GA-BP neural network. Pet. Sci. 2023, 21, 621–640. [CrossRef]
56. Ismail, A.; Ewida, H.F.; Nazeri, S.; Al-Ibiary, M.G.; Zollo, A. Gas channels and chimneys prediction using artificial neural networks
and multi-seismic attributes, offshore West Nile Delta, Egypt. J. Pet. Sci. Eng. 2022, 208, 109349. [CrossRef]
57. Goliatt, L.; Saporetti, C.; Oliveira, L.; Pereira, E. Performance of evolutionary optimized machine learning for modeling total
organic carbon in core samples of shale gas fields. Petroleum 2023, 10, 150–164. [CrossRef]
58. Amar, M.N.; Ghahfarokhi, A.J.; Ng, C.S.W.; Zeraibi, N. Optimization of WAG in real geological field using rigorous soft computing
techniques and nature-inspired algorithms. J. Pet. Sci. Eng. 2021, 206, 109038. [CrossRef]
59. Mao, W.; Wei, B.; Xu, X.; Chen, L.; Wu, T.; Peng, Z.; Ren, C. Power transformers fault diagnosis using graph neural networks
based on dissolved gas data. J. Phys. Conf. Ser. 2022, 2387, 012029. [CrossRef]
60. Ghosh, I.; Chaudhuri, T.D.; Alfaro-Cortés, E.; Gámez, M.; García, N. A hybrid approach to forecasting futures prices with
simultaneous consideration of optimality in ensemble feature selection and advanced artificial intelligence. Technol. Forecast. Soc.
Chang. 2022, 181, 121757. [CrossRef]
61. Wang, B.; Guo, Y.; Wang, D.; Zhang, Y.; He, R.; Chen, J. Prediction model of natural gas pipeline crack evolution based on
optimized DCNN-LSTM. Mech. Syst. Signal Process. 2022, 181, 109557. [CrossRef]
62. Yang, R.; Liu, X.; Yu, R.; Hu, Z.; Duan, X. Long short-term memory suggests a model for predicting shale gas production. Appl.
Energy 2022, 322, 119415. [CrossRef]
63. Werneck, R.d.O.; Prates, R.; Moura, R.; Gonçalves, M.M.; Castro, M.; Soriano-Vargas, A.; Júnior, P.R.M.; Hossain, M.M.; Zampieri,
M.F.; Ferreira, A.; et al. Data-driven deep-learning forecasting for oil production and pressure. J. Pet. Sci. Eng. 2022, 210, 109937.
[CrossRef]
64. Antariksa, G.; Muammar, R.; Nugraha, A.; Lee, J. Deep sequence model-based approach to well log data imputation and
petrophysical analysis: A case study on the West Natuna Basin, Indonesia. J. Appl. Geophys. 2023, 218, 105213. [CrossRef]
65. Das, S.; Paramane, A.; Chatterjee, S.; Rao, U.M. Accurate Identification of Transformer Faults from Dissolved Gas Data Using
Recursive Feature Elimination Method. IEEE Trans. Dielectr. Electr. Insul. 2023, 30, 466–473. [CrossRef]
66. Barjouei, H.S.; Ghorbani, H.; Mohamadian, N.; Wood, D.A.; Davoodi, S.; Moghadasi, J.; Saberi, H. Prediction performance
advantages of deep machine learning algorithms for two-phase flow rates through wellhead chokes. J. Pet. Explor. Prod. Technol.
2021, 11, 1233–1261. [CrossRef]
67. Martínez, V.; Rocha, A. The Golem: A General Data-Driven Model for Oil & Gas Forecasting Based on Recurrent Neural Networks.
IEEE Access 2023, 11, 41105–41132. [CrossRef]
68. Wang, Z.; Bai, L.; Song, G.; Zhang, Y.; Zhu, M.; Zhao, M.; Chen, L.; Wang, M. Optimized faster R-CNN for oil wells detection from
high-resolution remote sensing images. Int. J. Remote Sens. 2023, 44, 6897–6928. [CrossRef]
69. Hiassat, A.; Diabat, A.; Rahwan, I. A genetic algorithm approach for location-inventory-routing problem with perishable products.
J. Manuf. Syst. 2017, 42, 93–103. [CrossRef]
70. Sharma, V.; Cali, Ü.; Sardana, B.; Kuzlu, M.; Banga, D.; Pipattanasomporn, M. Data-driven short-term natural gas demand
forecasting with machine learning techniques. J. Pet. Sci. Eng. 2021, 206, 108979. [CrossRef]
71. Phan, H.C.; Duong, H.T. Predicting burst pressure of defected pipeline with Principal Component Analysis and adaptive Neuro
Fuzzy Inference System. Int. J. Press. Vessel. Pip. 2021, 189, 104274. [CrossRef]
Sensors 2024, 24, 4013 55 of 57

72. Hamedi, H.; Zendehboudi, S.; Rezaei, N.; Saady, N.M.C.; Zhang, B. Modeling and optimization of oil adsorption capacity on
functionalized magnetic nanoparticles using machine learning approach. J. Mol. Liq. 2023, 392, 123378. [CrossRef]
73. Castro, A.O.D.S.; Santos, M.D.J.R.; Leta, F.R.; Lima, C.B.C.; Lima, G.B.A. Unsupervised Methods to Classify Real Data from
Offshore Wells. Am. J. Oper. Res. 2021, 11, 227–241. [CrossRef]
74. Ma, B.; Shuai, J.; Liu, D.; Xu, K. Assessment on failure pressure of high strength pipeline with corrosion defects. Eng. Fail. Anal.
2013, 32, 209–219. [CrossRef]
75. Shuai, Y.; Shuai, J.; Xu, K. Probabilistic analysis of corroded pipelines based on a new failure pressure model. Eng. Fail. Anal.
2017, 81, 216–233. [CrossRef]
76. Phan, H.C.; Dhar, A.S.; Mondal, B.C. Revisiting burst pressure models for corroded pipelines. Can. J. Civ. Eng. 2017, 44, 485–494.
[CrossRef]
77. Freire, J.; Vieira, R.; Castro, J.; Benjamin, A. Part 3: Burst tests of pipeline with extensive longitudinal metal loss. Exp. Tech. 2006,
30, 60–65. [CrossRef]
78. Cronin, D.S. Assessment of Corrosion Defects in Pipelines. Ph.D. Thesis, University of Waterloo, Waterloo, ON, Canada, 2000.
79. Ghasemieh, A.; Lloyed, A.; Bahrami, P.; Vajar, P.; Kashef, R. A novel machine learning model with Stacking Ensemble Learner for
predicting emergency readmission of heart-disease patients. Decis. Anal. J. 2023, 7, 100242. [CrossRef]
80. Jeny, J.R.V.; Reddy, N.S.; Aishwarya, P.; Samreen. A Classification Approach for Heart Disease Diagnosis using Machine Learning.
In Proceedings of the 2021 6th International Conference on Signal Processing, Computing and Control (ISPCC), Solan, India, 7–9
October 2021; pp. 456–459. [CrossRef]
81. Mazumder, R.K.; Salman, A.M.; Li, Y. Failure risk analysis of pipelines using data-driven machine learning algorithms. Struct. Saf.
2021, 89, 102047. [CrossRef]
82. Liu, S.; Zhao, Y.; Wang, Z. Artificial Intelligence Method for Shear Wave Travel Time Prediction considering Reservoir Geological
Continuity. Math. Probl. Eng. 2021, 2021, 5520428. [CrossRef]
83. Saroja, S.; Haseena, S.; Madavan, R. Dissolved Gas Analysis of Transformer: An Approach Based on ML and MCDM. IEEE Trans.
Dielectr. Electr. Insul. 2023, 30, 2429–2438. [CrossRef]
84. Raj, R.A.; Sarathkumar, D.; Venkatachary, S.K.; Andrews, L.J.B. Classification and Prediction of Incipient Faults in Transformer
Oil by Supervised Machine Learning using Decision Tree. In Proceedings of the 2023 3rd International conference on Artificial
Intelligence and Signal Processing (AISP), Vijayawada, India, 18–20 March 2023; pp. 1–6. [CrossRef]
85. Aslam, N.; Khan, I.U.; Alansari, A.; Alrammah, M.; Alghwairy, A.; Alqahtani, R.; Alqahtani, R.; Almushikes, M.; AL Hashim, M.
Anomaly Detection Using Explainable Random Forest for the Prediction of Undesirable Events in Oil Wells. Appl. Comput. Intell.
Soft Comput. 2022, 2022, 1558381. [CrossRef]
86. Turan, E.M.; Jaschke, J. Classification of undesirable events in oil well operation. In Proceedings of the 2021 23rd International
Conference on Process Control (PC), Strbske Pleso, Slovakia, 1–4 June 2021; pp. 157–162. [CrossRef]
87. Gatta, F.; Giampaolo, F.; Chiaro, D.; Piccialli, F. Predictive maintenance for offshore oil wells by means of deep learning features
extraction. Expert Syst. 2022, 41, e13128. [CrossRef]
88. Brønstad, C.; Netto, S.L.; Ramos, A.L.L. Data-driven Detection and Identification of Undesirable Events in Subsea Oil Wells. In
Proceedings of the SENSORDEVICES 2021 Twelfth International Conference on Sensor Device Technologies and Applications,
Athens, Greece, 14–18 November 2021; pp. 1–6.
89. Ben Jabeur, S.; Khalfaoui, R.; Ben Arfi, W. The effect of green energy, global environmental indexes, and stock markets in
predicting oil price crashes: Evidence from explainable machine learning. J. Environ. Manag. 2021, 298, 113511. [CrossRef]
[PubMed]
90. Baabbad, H.K.H.; Artun, E.; Kulga, B. Understanding the Controlling Factors for CO2 Sequestration in Depleted Shale Reservoirs
Using Data Analytics and Machine Learning. In Proceedings of the SPE EuropEC—Europe Energy Conference featured at the
83rd EAGE Annual Conference & Exhibition, Madrid, Spain, 6–9 June 2022. [CrossRef]
91. Alsaihati, A.; Elkatatny, S.; Mahmoud, A.A.; Abdulraheem, A. Use of Machine Learning and Data Analytics to Detect Downhole
Abnormalities While Drilling Horizontal Wells, with Real Case Study. J. Energy Resour. Technol. Trans. ASME 2021, 143, 043201.
[CrossRef]
92. Kumar, A.; Hassanzadeh, H. A qualitative study of the impact of random shale barriers on SAGD performance using data
analytics and machine learning. J. Pet. Sci. Eng. 2021, 205, 108950. [CrossRef]
93. Ma, H.; Wang, H.; Geng, M.; Ai, Y.; Zhang, W.; Zheng, W. A new hybrid approach model for predicting burst pressure of corroded
pipelines of gas and oil. Eng. Fail. Anal. 2023, 149, 107248. [CrossRef]
94. Canonaco, G.; Roveri, M.; Alippi, C.; Podenzani, F.; Bennardo, A.; Conti, M.; Mancini, N. A Machine-Learning Approach for the
Prediction of Internal Corrosion in Pipeline Infrastructures. In Proceedings of the 2021 IEEE International Instrumentation and
Measurement Technology Conference (I2MTC), Glasgow, UK, 17–20 May 2021; pp. 1–6. [CrossRef]
95. Fang, J.; Cheng, X.; Gai, H.; Lin, S.; Lou, H. Development of machine learning algorithms for predicting internal corrosion of
crude oil and natural gas pipelines. Comput. Chem. Eng. 2023, 177, 108358. [CrossRef]
96. Lv, Q.; Zheng, R.; Guo, X.; Larestani, A.; Hadavimoghaddam, F.; Riazi, M.; Hemmati-Sarapardeh, A.; Wang, K.; Li, J. Modelling
minimum miscibility pressure of CO2 -crude oil systems using deep learning, tree-based, and thermodynamic models: Application
to CO2 sequestration and enhanced oil recovery. Sep. Purif. Technol. 2023, 310, 123086. [CrossRef]
Sensors 2024, 24, 4013 56 of 57

97. Zhu, X.; Zhang, H.; Ren, Q.; Zhang, D.; Zeng, F.; Zhu, X.; Zhang, L. An automatic identification method of imbalanced lithology
based on Deep Forest and K-means SMOTE. Geoenergy Sci. Eng. 2023, 224, 211595. [CrossRef]
98. Chanchotisatien, P.; Vong, C. Feature engineering and feature selection for fault type classification from dissolved gas values in
transformer oil. In Proceedings of the ICSEC 2021—25th International Computer Science and Engineering Conference, Chiang
Rai, Thailand, 18–20 November 2021; pp. 75–80. [CrossRef]
99. de Jesus Rocha Santos, M.; de Salvo Castro, A.O.; Leta, F.R.; De Araujo, J.F.M.; de Souza Ferreira, G.; de Araújo Santos, R.; de
Campos Lima, C.B.; Lima, G.B.A. Statistical analysis of offshore production sensors for failure detection applications / Análise
estatística dos sensores de produção offshore para aplicações de detecção de falhas. Braz. J. Dev. 2021, 7, 85880–85898. [CrossRef]
100. Ali, M.; Zhu, P.; Jiang, R.; Huolin, M.; Ehsan, M.; Hussain, W.; Zhang, H.; Ashraf, U.; Ullaah, J.; Ullah, J. Reservoir characterization
through comprehensive modeling of elastic logs prediction in heterogeneous rocks using unsupervised clustering and class-based
ensemble machine learning. Appl. Soft Comput. 2023, 148, 110843. [CrossRef]
101. Salamai, A.A. Deep learning framework for predictive modeling of crude oil price for sustainable management in oil markets.
Expert Syst. Appl. 2023, 211, 118658. [CrossRef]
102. Ashayeri, C.; Jha, B. Evaluation of transfer learning in data-driven methods in the assessment of unconventional resources. J. Pet.
Sci. Eng. 2021, 207, 109178. [CrossRef]
103. Vuttipittayamongkol, P.; Tung, A.; Elyan, E. A Data-Driven Decision Support Tool for Offshore Oil and Gas Decommissioning.
IEEE Access 2021, 9, 137063–137082. [CrossRef]
104. Song, T.; Zhu, W.; Chen, Z.; Jin, W.; Song, H.; Fan, L.; Yue, M. A novel well-logging data generation model integrated with
random forests and adaptive domain clustering algorithms. Geoenergy Sci. Eng. 2023, 231, 212381. [CrossRef]
105. Awuku, B.; Huang, Y.; Yodo, N. Predicting Natural Gas Pipeline Failures Caused by Natural Forces: An Artificial Intelligence
Classification Approach. Appl. Sci. 2023, 13, 4322. [CrossRef]
106. Al-Mudhafar, W.J.; Abbas, M.A.; Wood, D.A. Performance evaluation of boosting machine learning algorithms for lithofacies
classification in heterogeneous carbonate reservoirs. Mar. Pet. Geol. 2022, 145, 105886. [CrossRef]
107. Wen, H.; Liu, L.; Zhang, J.; Hu, J.; Huang, X. A hybrid machine learning model for landslide-oriented risk assessment of
long-distance pipelines. J. Environ. Manag. 2023, 342, 118177. [CrossRef] [PubMed]
108. Otchere, D.A.; Ganat, T.O.A.; Nta, V.; Brantson, E.T.; Sharma, T. Data analytics and Bayesian Optimised Extreme Gradient
Boosting approach to estimate cut-offs from wireline logs for net reservoir and pay classification. Appl. Soft Comput. 2022, 120,
108680. [CrossRef]
109. Gamal, H.; Elkatatny, S.; Alsaihati, A.; Abdulraheem, A. Intelligent Prediction for Rock Porosity While Drilling Complex Lithology
in Real Time. Comput. Intell. Neurosci. 2021, 2021, 9960478. [CrossRef]
110. Ismail, M.F.H.; May, Z.; Asirvadam, V.S.; Nayan, N.A. Machine-Learning-Based Classification for Pipeline Corrosion with Monte
Carlo Probabilistic Analysis. Energies 2023, 16, 3589. [CrossRef]
111. Prasojo, R.A.; Putra, M.A.A.; Ekojono; Apriyani, M.E.; Rahmanto, A.N.; Ghoneim, S.S.; Mahmoud, K.; Lehtonen, M.; Darwish,
M.M. Precise transformer fault diagnosis via random forest model enhanced by synthetic minority over-sampling technique.
Electr. Power Syst. Res. 2023, 220, 109361. [CrossRef]
112. Ma, Z.; Chang, H.; Sun, Z.; Liu, F.; Li, W.; Zhao, D.; Chen, C. Very Short-Term Renewable Energy Power Prediction Using XGBoost
Optimized by TPE Algorithm. In Proceedings of the 2020 4th International Conference on HVDC (HVDC), Xi’an, China, 6–9
November 2020; pp. 1236–1241. [CrossRef]
113. Ma, S.; Jiang, Z.; Liu, W. Modeling Drying-Energy Consumption in Automotive Painting Line Based on ANN and MLR for
Real-Time Prediction. Int. J. Precis. Eng. Manuf. Technol. 2019, 6, 241–254. [CrossRef]
114. Guo, Z.; Wang, H.; Kong, X.; Shen, L.; Jia, Y. Machine Learning-Based Production Prediction Model and Its Application in
Duvernay Formation. Energies 2021, 14, 5509. [CrossRef]
115. Ibrahim, N.M.; Alharbi, A.A.; Alzahrani, T.A.; Abdulkarim, A.M.; Alessa, I.A.; Hameed, A.M.; Albabtain, A.S.; Alqahtani, D.A.;
Alsawwaf, M.K.; Almuqhim, A.A. Well Performance Classification and Prediction: Deep Learning and Machine Learning Long
Term Regression Experiments on Oil, Gas, and Water Production. Sensors 2022, 22, 5326. [CrossRef] [PubMed]
116. Yin, H.; Liu, C.; Wu, W.; Song, K.; Dan, Y.; Cheng, G. An integrated framework for criticality evaluation of oil & gas pipelines
based on fuzzy logic inference and machine learning. J. Nat. Gas Sci. Eng. 2021, 96, 104264. [CrossRef]
117. Chen, H.; Zhang, C.; Jia, N.; Duncan, I.; Yang, S.; Yang, Y. A machine learning model for predicting the minimum miscibility
pressure of CO2 and crude oil system based on a support vector machine algorithm approach. Fuel 2021, 290, 120048. [CrossRef]
118. Naserzadeh, Z.; Nohegar, A. Development of HGAPSO-SVR corrosion prediction approach for offshore oil and gas pipelines. J.
Loss Prev. Process. Ind. 2023, 84, 105092. [CrossRef]
119. Yuan, Z.; Chen, L.; Liu, G.; Shao, W.; Zhang, Y.; Yang, W. Physics-based Bayesian linear regression model for predicting length of
mixed oil. Geoenergy Sci. Eng. 2023, 223, 211466. [CrossRef]
120. Box, G.E.P.; Jenkins, G.M.; Reinsel, G.C.; Ljung, G.M. Time Series Analysis: Forecasting and Control; John Wiley & Sons: Hoboken,
NJ, USA, 2015.
121. McCuen, R.H. Modeling Hydrologic Change: Statistical Methods; CRC Press: Boca Raton, FL, USA, 2016.
122. Liu, J.; Zhao, Z.; Zhong, Y.; Zhao, C.; Zhang, G. Prediction of the dissolved gas concentration in power transformer oil based on
SARIMA model. Energy Rep. 2022, 8, 1360–1367. [CrossRef]
Sensors 2024, 24, 4013 57 of 57

123. Li, X.; Guo, X.; Liu, L.; Cao, Y.; Yang, B. A novel seasonal grey model for forecasting the quarterly natural gas production in China.
Energy Rep. 2022, 8, 9142–9157. [CrossRef]
124. Rashidi, S.; Mehrad, M.; Ghorbani, H.; Wood, D.A.; Mohamadian, N.; Moghadasi, J.; Davoodi, S. Determination of bubble point
pressure & oil formation volume factor of crude oils applying multiple hidden layers extreme learning machine algorithms. J. Pet.
Sci. Eng. 2021, 202, 108425. [CrossRef]
125. Gong, X.; Liu, L.; Ma, L.; Dai, J.; Zhang, H.; Liang, J.; Liang, S. A Leak Sample Dataset Construction Method for Gas Pipeline
Leakage Estimation Using Pipeline Studio. In Proceedings of the International Conference on Advanced Mechatronic Systems
(ICAMechS), Tokyo, Japan, 9–12 December 2021; pp. 28–32. [CrossRef]
126. Chung, S.; Loh, A.; Jennings, C.M.; Sosnowski, K.; Ha, S.Y.; Yim, U.H.; Yoon, J.-Y. Capillary flow velocity profile analysis on
paper-based microfluidic chips for screening oil types using machine learning. J. Hazard. Mater. 2023, 447, 130806. [CrossRef]
[PubMed]
127. Mohamadian, N.; Ghorbani, H.; Wood, D.A.; Mehrad, M.; Davoodi, S.; Rashidi, S.; Soleimanian, A.; Shahvand, A.K. A
geomechanical approach to casing collapse prediction in oil and gas wells aided by machine learning. J. Pet. Sci. Eng. 2021, 196,
107811. [CrossRef]
128. Sabah, M.; Mehrad, M.; Ashrafi, S.B.; Wood, D.A.; Fathi, S. Hybrid machine learning algorithms to enhance lost-circulation
prediction and management in the Marun oil field. J. Pet. Sci. Eng. 2021, 198, 108125. [CrossRef]
129. Shi, J.; Xie, W.; Huang, X.; Xiao, F.; Usmani, A.S.; Khan, F.; Yin, X.; Chen, G. Real-time natural gas release forecasting by using
physics-guided deep learning probability model. J. Clean. Prod. 2022, 368, 133201. [CrossRef]
130. Machado, A.P.F.; Vargas, R.E.V.; Ciarelli, P.M.; Munaro, C.J. Improving performance of one-class classifiers applied to anomaly
detection in oil wells. J. Pet. Sci. Eng. 2022, 218, 110983. [CrossRef]
131. Zhou, J.; Liu, B.; Shao, M.; Yin, C.; Jiang, Y.; Song, Y. Lithologic classification of pyroclastic rocks: A case study for the third
member of the Huoshiling Formation, Dehui fault depression, Songliao Basin, NE China. J. Pet. Sci. Eng. 2022, 214, 110456.
[CrossRef]
132. Zhang, G.; Wang, Z.; Mohaghegh, S.; Lin, C.; Sun, Y.; Pei, S. Pattern visualization and understanding of machine learning models
for permeability prediction in tight sandstone reservoirs. J. Pet. Sci. Eng. 2021, 200, 108142. [CrossRef]
133. Zuo, Z.; Ma, L.; Liang, S.; Liang, J.; Zhang, H.; Liu, T. A semi-supervised leakage detection method driven by multivariate time
series for natural gas gathering pipeline. Process. Saf. Environ. Prot. 2022, 164, 468–478. [CrossRef]
134. Chen, Z.; Yu, W.; Liang, J.-T.; Wang, S.; Liang, H.-C. Application of statistical machine learning clustering algorithms to improve
EUR predictions using decline curve analysis in shale-gas reservoirs. J. Pet. Sci. Eng. 2022, 208, 109216. [CrossRef]
135. Fernandes, W.; Komati, K.S.; Gazolli, K.A.d.S. Anomaly detection in oil-producing wells: A comparative study of one-class
classifiers in a multivariate time series dataset. J. Pet. Explor. Prod. Technol. 2023, 14, 343–363. [CrossRef]
136. Gao, G.; Hazbeh, O.; Rajabi, M.; Tabasi, S.; Ghorbani, H.; Seyedkamali, R.; Shayanmanesh, M.; Radwan, A.E.; Mosavi, A.H.
Application of GMDH model to predict pore pressure. Front. Earth Sci. 2023, 10, 1043719. [CrossRef]
137. Cirac, G.; Farfan, J.; Avansi, G.D.; Schiozer, D.J.; Rocha, A. Deep hierarchical distillation proxy-oil modeling for heterogeneous
carbonate reservoirs. Eng. Appl. Artif. Intell. 2023, 126, 107076. [CrossRef]
138. Dayev, Z.; Shopanova, G.; Toksanbaeva, B.; Yetilmezsoy, K.; Sultanov, N.; Sihag, P.; Bahramian, M.; Kıyan, E. Modeling the flow
rate of dry part in the wet gas mixture using decision tree/kernel/non-parametric regression-based soft-computing techniques.
Flow Meas. Instrum. 2022, 86, 102195. [CrossRef]
139. Das, S.; Paramane, A.; Chatterjee, S.; Rao, U.M. Sensing Incipient Faults in Power Transformers Using Bi-Directional Long
Short-Term Memory Network. IEEE Sens. Lett. 2023, 7, 7000304. [CrossRef]
140. Gao, J.; Li, Z.; Zhang, M.; Gao, Y.; Gao, W. Unsupervised Seismic Random Noise Suppression Based on Local Similarity and
Replacement Strategy. IEEE Access 2023, 11, 48924–48934. [CrossRef]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

You might also like