Sensors 24 04013
Sensors 24 04013
Review
A Review of Predictive Analytics Models in the Oil and
Gas Industries
Putri Azmira R Azmi 1 , Marina Yusoff 1,2,3, * and Mohamad Taufik Mohd Sallehud-din 4
Abstract: Enhancing the management and monitoring of oil and gas processes demands the develop-
ment of precise predictive analytic techniques. Over the past two years, oil and its prediction have
advanced significantly using conventional and modern machine learning techniques. Several review
articles detail the developments in predictive maintenance and the technical and non-technical aspects
of influencing the uptake of big data. The absence of references for machine learning techniques
impacts the effective optimization of predictive analytics in the oil and gas sectors. This review paper
offers readers thorough information on the latest machine learning methods utilized in this industry’s
predictive analytical modeling. This review covers different forms of machine learning techniques
used in predictive analytical modeling from 2021 to 2023 (91 articles). It provides an overview of the
details of the papers that were reviewed, describing the model’s categories, the data’s temporality,
field, and name, the dataset’s type, predictive analytics (classification, clustering, or prediction), the
models’ input and output parameters, the performance metrics, the optimal model, and the model’s
benefits and drawbacks. In addition, suggestions for future research directions to provide insights
into the potential applications of the associated knowledge. This review can serve as a guide to
enhance the effectiveness of predictive analytics models in the oil and gas industries.
Prediction
34%
Classification
53%
Clustering
13%
Figurethe
Figure 1 illustrates 1 illustrates
three categoriesthe of
three categories
predictive of predictive
analytics applied in the analytics
study applied in the study
using ML and using ML and AAI
AI techniques. techniques.
little A little over
over 13% of clustering studies13%
haveof clustering
employed mod-studies have employed
modeling
eling methods. Many ofmethods.
these do not Many of these
require do not
clustering require
studies clustering
because there isstudies
enough because there is enough
supervised labeling
superviseddata, labeling
which leads to 53%
data, of researchers
which leads to 53%favoring classification.favoring classification.
of researchers
Recently, modern artificial
Recently, intelligence
modern models,
artificial such as ANN,
intelligence Deep Learning
models, such as (DL),
ANN, Deep Learning (DL),
Fuzzy Logic, Decision Tree (DT), RF, and hybrid models have been implemented to model
Fuzzy Logic, Decision Tree (DT), RF, and hybrid models have been implemented to model
the O&G domain, such as a review of 91 publications and a bibliography on the use of AI
the O&G domain, such as a review of 91 publications and a bibliography on the use of
in the O&G field. Figure 2 shows that, in recent decades, this field of research has in-
AI in theadditional
creased. Nevertheless, O&G field. Figure
studies 2 shows
on predictive that, in
analytics recent
models anddecades, this field of research has
datasets are
required to increased.
identify the Nevertheless,
suitability of theadditional
model and studies
dataset foronincorporating
predictive analytics
diverse models and datasets
mathematical are
and required
statisticalto identify
elements the suitability
alongside of the
heuristic and modelmethods.
arithmetic and dataset for incorporating diverse
The use
of AI has been widely utilized
mathematical andinstatistical
various fields, such asalongside
elements science [13–15], energy
heuristic and[16–18],
arithmetic methods. The use
and economicsof AI[19–21]. Some widely
has been examplesutilized
include in
MLvarious
techniques [22–24],
fields, such ensemble tech-[13–15], energy [16–18],
as science
niques [25,26], soft computing techniques [27,28], statistical techniques [29], and fuzzy-
and economics [19–21]. Some examples include ML techniques [22–24], ensemble tech-
based systems [30]. The effective application of AI in several O&G domains, such as gas
niques
[31], pipeline [25,26],
[32], crude softoxyhydrogen
oil [33], computinggas techniques
retrofit [34],[27,28], statistical
and transformer oil techniques
[35], [29], and fuzzy-
based systems [30]. The effective
has received increased interest in the last few years. application of AI in several O&G domains, such as
gas [31], pipeline [32], crude oil [33], oxyhydrogen gas retrofit [34], and transformer oil [35],
has received increased interest in the last few years.
Predicting the performance and production of O&G has consistently presented a
challenge. The imperative to create resilient prediction methods is driven by the desire for
enhanced financial viability and superior technical outcomes [36]. As a critical sector, the
O&G industry faces complex challenges, ranging from volatile market conditions to oper-
ational uncertainties and safety concerns. Its transformative potential is to revolutionize
operations, enhance efficiency, and mitigate risks.
Predictive analytics offers a powerful toolset to address these challenges and unlock
numerous benefits. For instance, proactive decision-making by O&G engineers is made pos-
Sensors 2024, 24, 4013 3 of 57
sible by operational efficiency from real-time data analysis. This helps organizations spot
problems before they escalate, optimize resource utilization, and streamline processes. In
addition, cost reduction can help O&G companies be cost-effective by optimizing resource
allocation, reducing waste, and enhancing overall resource efficiency through insights
from predictive analytics. Numerous studies have explored and documented AI’s effec-
Sensors 2024, 24, x FOR PEER REVIEWtiveness in modeling O&G over the last three years. Many initial 3efforts
of 60 comprised basic
and conventional AI techniques, including perceptron-based Artificial Neural Networks
(ANNs) [37–39].
Total Publications By Year
Total Number Of Published Papers = 91
40
34
35 32
30
25
25
20
15
10
5
0
2021 2022 2023
Year
temporal data from a buried gas pipeline, employing various algorithms with a combination
of ANN and metaheuristics models such as the Quantum Particle Swarm Optimization-
Artificial Neural Network, Weighted Quantum Particle Swarm Optimization-Artificial
Neural Network (QPSO-ANN), and Levy Flight Quantum Particle Swarm Optimization-
Artificial Neural Network (LWQPSO-ANN). The study focused on predicting crater width,
with important parameters for the prediction of buried pipelines, such as pipe diameter
(mm), operating pressure (MPa), cover depth (m), and crater width (m). In this work,
LWQPSO-ANN outperformed other methods by more than 95%.
Meanwhile, in another study on non-temporal pipeline conditions, a range of ML
algorithms, including ANN, Support Vector Machine (SVM), Ensemble Learning (EL), and
Support Vector Regression (SVR), were used [47]. Their investigation included elements
impacting corrosion defect depth, such as CO2 levels, temperature, pH, liquid velocity,
pressure, stress, glycol concentration, H2 S levels, organic acid content, oil type, water
chemistry, and hydraulic diameter. The emphasis on the ANN was evident, indicating that
it is a skilled navigator of the complex network of variables affecting pipeline corrosion.
In the complicated landscape of well-data analysis, Sami and Ibrahim [48] utilized non-
temporal datasets from Middle East fields, concentrating on vertical wells. Random Forest
(RF), k-nearest Neighbors (KNN), and ANNs were used to predict the bottom-hole pressure
flowing (Pwf) through vertical petroleum wells. The preference for the ANN spotlighted its
efficacy in modeling intricate relationships within well data, as underscored by evaluation
metrics such as the Mean Squared Error (MSE) and Coefficient of Determination (R2 )
The proposed method that used R2 values for training and testing were 97% and 93%
respectively, significantly higher than the models implemented in the study.
Moreover, Qayyum Chohan et al. [49] constructed non-temporal datasets using ML
algorithms like the ANN, Least Square Boosting (LSB), and Bagging for the prediction
of oil using 2600 samples from oil shales. The input parameters that were used in the
study are air molar flowrate, illite silica, carbon, hydrogen content, feed preheater temp,
and air preheater temp. Through a coefficient of correlation of 99.6% for oil yield and
99.9% for carbon dioxide, the Root Mean Squared Error (RMSE) evaluation metric was
highlighted, emphasizing the applicability of ANNs in interpreting the complex factors
influencing oil yield and carbon dioxide emissions in complex processes. The suggested
model outperformed other models in terms of accuracy. A set of ML methods, including
NB+KNN, DT, RF, SVM, and ANN, were applied to 769 temporal data samples related to
ocean slick signs in the surrounding area of the exploration site [50]. The study’s emphasis
on ANNs amidst this array of algorithms underscored its pivotal role in discerning Sea-
Surface Petroleum Signatures. Although the specific parameters of the ocean slick signature
were not explicitly stated, the study spotlighted the ANN’s prowess in unraveling patterns
related to oil detection in dynamic ocean conditions with an accuracy of 90%. However, the
proposed model did not give significant results for classifying ocean slick signatures.
Several machine learning models were used in the study, including Partial Least
Squares (PLS), Deep Neural Network (DNN), Feature Projection Model (FPM), Feature
Projection-Deep Neural Network (FP-DNN), and Feature Projection-PLS (FP-PLS) [51]. The
study looked at long-distance pipelines without considering time. The dataset consisted
of 2093 samples, and the prediction task included characteristics such as the original total
oil length, inner dimensions, pipeline length, Reynolds quantity, comparable length, and
actual combined oil length. The assessment parameter employed was RMSE, and the DNN
model displayed an RMSE of 146%. The research showed that the error rate was the highest
and least convincing one, indicating that the model’s prediction accuracy must be increased.
Utilizing the ASPEN HYSYS V11 process simulator, Mendoza et al. [52] used non-temporal
analysis in crude oil processes. The study used the ANN and Genetic Algorithm (GA)
to predict critical variables such as feed flow rate, gas product pressure, interstage gas
discharge pressure, and centrifugal compressor isentropic efficiency, aiming to increase oil
production. The ANN+GA model improved the performance of the predicted variable.
Sensors 2024, 24, 4013 5 of 57
Shifting the focus to gas-phase pollutants, Sakhaei et al. [53] performed non-temporal
research using proprietary data. The study used ANNs to estimate methanol, α-pinene,
and hydrogen sulfide concentrations for gas-phase contamination removal in OLP-BTF and
TLP-BTF. The ANN+PSO model, which used 104 samples, achieved a desired performance
measurement using R2 of more than 99% indicating its effectiveness. The authors were
prompted to contemplate possible improvements for practical implementations when the
suggested model showed encouraging outcomes. ANN, Least Square Support Vector
Machine (LSSVM), and Multi-Gene Genetic Programming (MGGP) were utilized in reser-
voir engineering to analyze temporal data for gas-aided gravity drainage (GAGD) [54].
Compared to the suggested strategy, with various input parameters and 223 samples, the
ANN’s model showed 976% of R2 and 0.0520 of RMSE. In contrast, MGGP returned 89%
(R2 ) and 0.0846 (RMSE). The study demonstrated the superiority of the ANN technique in
reservoir prediction tasks.
Mao et al. (2022) investigated DGA datasets by combining Multivariate Time Series
clustering approaches and graph neural networks (GNNs), moving on to transformer fault
diagnosis in the temporal domain. The study concentrated on clustering H2 , CH4 , C2 H6 ,
C2 H4 , C2 H2 , CO, and CO2 using 1408 samples to diagnose power transformer defects.
The MTGNN model attained an impressive 92% accuracy, demonstrating its efficacy in
the spatiotemporal area of power transformer problem detection. In the context of non-
temporal analysis within the field of crude oil, Wang et al. [33] studied contemporary
research, employing an ANN and a hybrid Multilayer Perceptron with Backpropagation
for prediction. The model used 172 samples and a variety of characteristics to estimate
diffusion coefficients, including temperature, pressure, liquid viscosity, gas viscosity, liquid
molar volume, gas molar volume, liquid molecular weight, gas molecular weight, and
interfacial tension. Although the training and testing R2 s were 88% and 89%, respectively,
the proposed Multilayer Perceptron with Backpropagation model had less accuracy, and
the hybrid technique did not deliver the expected improvement.
The study from Zhang et al. [55] experimented with the temporal crude oil and
transportation system data using the GA with a backpropagation neural network for
prediction. The model produced outstanding results with 509 samples, including numerous
factors linked to the system’s temperature, pressure, and consumption, achieving 99%
accuracy for energy and heat and 97% for power. The GA with a backpropagation neural
network was highly influential in predicting the complicated dynamics of the crude oil
system. In cooperation with the Egyptian General Petroleum Corporation (EGPC), Ismail
et al. [56] conducted a temporal study of drilling activities. The model used Multilayer
Perceptron (MLP) and the ANN for grouping and classification tasks based on epochs, age,
formation, lithology, and fields for predicting gas routes and chimneys. Surprisingly, the
MLP model achieved an RMSE of 0.10, indicating decreased error rates and surpassing
other approaches for predicting drilling-related occurrences.
The Extreme Learning Machine (ELM), Elastic Net Linear, Linear Support Vector
Regression (Linear-SVR), Multivariate Adaptive Regression Spline, Artificial Bee Colony,
Particle Swarm Optimization (PSO), Differential Evolution, Simple Genetic Algorithm, Grey
Wolf Optimizer (GWO), and Exponential Natural Evolution Strategies (xNES) are some
of the models that Goliatt et al. [57] used in the temporal domain of shale gas exploration
within the YuDong-Nan shale gas field. To estimate total organic carbon, the DE+ELM
hybrid model produced an acceptable RMSE of 0.497 when predicting factors such as
clay, K-feldspar, pyrite, and other elements. Nevertheless, GWO did not outperform the
other approaches. In the temporal field of reservoir engineering, specifically within the
North Sea’s “Gullfaks”. An MLP-LMA model was suggested by Amar et al. [58] to produce
predictions for half-cycle time, shutdown, water alternating gas injection, and the amount of
gas and water injected. The proposed approach outperformed the other two proxy models,
achieving higher accuracy and much shorter simulation times. Table 1 lists research articles
on predictive analytics in the O&G field using ANN models.
Sensors 2024, 24, 4013 6 of 57
Table 1. A list of research articles on predictive analytics in the O&G field using ANN models.
Table 1. Cont.
Table 1. Cont.
Table 1. Cont.
Figure 3 shows
This interest the processes
in Deep Learning ofisthe input series
exemplified byina both
seriesbackward and forward
of significant direc-
studies show-
tions. Bi-LSTM models can learn from the entire sequence context by collecting
casing its applications. The success of MLSTM in this context was evident through robust information
about each metrics
evaluation sequence element
such from
as MAE andtheRMSE.
past and future.on
Building They
this,are highly suited
Werneck for extended
et al. [63] temporal
data and producing precise predictions of ions in the sequence [62].
the 301 samples of temporal analysis to oil wells from the Metro Interstate Traffic Volume,
There are two transfer statesand
in the LSTM model from Figure 3:utilizing
a hiddenLSTM, t ) and
state (hGated
Appliances Energy Prediction, UNISIM-II-M-CO datasets,
a cell state (c ) [62]. The passed c changes quite slowly. The output c is passed from ct − 1
t t t
in the previous state, with some added values [62]. However, there are typically significant
variances in ht among nodes. The LSTM model used the current input of xt and ht − 1
from the previous state to generate four states. Furthermore, zf , zi , and zo are accessible to
a gating-control state with values between 0 and 1, derived by multiplying the splicing
vector by the weight matrix and converting it by a sigmoid activation function. The tanh
activation function converts z to a value between −1 and 1 [62].
This interest in Deep Learning is exemplified by a series of significant studies show-
casing its applications. The success of MLSTM in this context was evident through robust
evaluation metrics such as MAE and RMSE. Building on this, Werneck et al. [63] extended
the 301 samples of temporal analysis to oil wells from the Metro Interstate Traffic Volume,
Appliances Energy Prediction, and UNISIM-II-M-CO datasets, utilizing LSTM, Gated Re-
current Unit (GRU), and LSTM + Seq2Seq architectures for predicting oil production and
Sensors 2024, 24, 4013 11 of 57
pressure. The parameters used in the study to predict oil production and pressure are
pressure (bottom-hole), water cut, gas–oil ratio, and gas–liquid ratio, which are considered
in the ratios between fluid production (oil, gas, and water). Symmetric Mean Absolute Per-
centage Error (SMAPE), RMSE, and MAE are evaluation measures that demonstrate how
well the models capture the dynamic characteristics of reservoirs. The LSTM + Seq2Seq
and GRU2 architectures are the best models that the researchers have proposed because of
the higher accuracy achieved. Nevertheless, the researchers recommend that future studies
include another metaheuristic method, such as the GA.
In 2022, Wang et al. [61] shifted the focus to the Longmaxi Formation of the Sichuan
Basin with 90,000 data samples for predicting the real-time pipeline crack. The study
proposed the DCNN + LSTM, ANN, LSTM, Recurrent Neural Network (RNN), and SVR
models for natural gas pipelines. The model showcases the impressive performance of
the DCNN + LSTM with an accuracy of 99.37%, emphasizing the significance of LSTM
in predicting shale gas production with robust evaluation metrics in the temporal well
data setting. Antariksa et al. [64] used the West Natuna Basin dataset, which contains
11,497 samples, aligned with input parameters, such as deep and shallow resistivities (LLD
and LLS), sonic (Vp), neutron-porosity (NPHI), density (RHOB), and gamma ray (GR), and
one output parameter, well log data imputation, to apply LSTM and RF models to predict
hydrocarbon production in the gas sector. This demonstrates that LSTM may be applied
to the gas output forecast using metrics like R2 , RMSE, and MSE. The suggested model
provides 94% more accuracy.
Another study explored the classification of non-temporal oil transformers using
the DGA local power utilities and IEC TC10 datasets with 1530 samples. The research
utilized KNN, SVM, and Extreme Gradient Boosting (XGBoost) to evaluate the model’s
performance using measures including accuracy, precision, and recall. This shows the
combination of the oversampling method, i.e., Synthetic Minority Oversampling Technique
(SMOTE), and KNN (KNN+SMOTE) shows the performing accuracy of DGA and IEC TC10,
which are 98% and 97%, respectively [65]. Barjouei et al. [66] studied non-temporal data
from the Soroush and South Iran oil fields, analyzing 7245 samples and predicting factors
such as choke size (D64), wellhead pressure (Pwh), oil specific gravity (γo), gas/liquid
ratio, and wellhead choke. The study proposed a few models of DL, which are DL, DT,
RF, ANNs, and SVR, revealing the superior performance of DL, has a greater accuracy
R2 at 99% than the other models. Together, these studies highlight the adaptability of
Deep Learning methods to handle temporal and non-temporal data in various O&G sector
applications. The insights derived from these endeavors, specifically focusing on Deep
Learning, contribute significantly to optimizing operations and decision-making processes
in this critical industry.
The time domain of the reservoir focuses on the Volve and UNISIM-IIH oil fields
and utilizes Long Short-Term Memory (LSTM) and GRU models for the classification
of 3257 samples based on oil, gas, water, or pressure levels [67]. Regarding O&G fore-
casting, the GRU model emerged as the frontrunner. With an ideal R2 of 99%, the GRU
model emerged as the leading model for O&G forecasting. This exceptional accuracy
demonstrates the effectiveness of the suggested GRU model in predicting O&G activity
within the given reservoir setting. In the analysis of non-temporal within the well domain,
Wang et al. [68] applied various Faster R-CNN models, including Faster R-CNN_Res50,
Faster R-CNN_Res50_DC, and Faster R-CNN_Res50_FPN, along with methods involving
Edge detection and Cluster+Soft-NMS, utilizing Google Earth Imagery encompassing
439 samples. Their goal was to organize oil wells depending on breadth and height. The
Faster R-CNN model with ClusterRPN obtained 71% precision. It is important to note that
the suggested approach was less than 90% accurate and required more time to run than
other models. Table 2 includes the published research on Deep Learning models for O&G
predictive analytics.
Sensors 2024, 24, 4013 12 of 57
Table 2. Summary of the published research on Deep Learning models for predictive analytics in the O&G field.
Table 2. Cont.
Table 3. Published research on Fuzzy Logic and Neuro-fuzzy modeling in predictive analytics in the O&G field.
(LR), Decision Tree (DT), RF, and Adaboost with a temporal perspective. The assessment
measures used were F1 score and accuracy, with a particular emphasis on DT, which reached
a significant accuracy of 97%. However, feature selection increased training time rather than
improved accuracy. Remarkably, the proposed technique struggled to categorize class 2
due to limited data availability and label disputes based on estimated attributes. The other
study focused on using the same dataset and utilized one-directional, CNN, RF, Graph
Neural Network (GNN), and QDA models [87]. RF achieved a mean accuracy of 95%. The
evaluation measures used were F1 score, accuracy, precision, and recall. Specifically, the
study discovered that increasing the number of time frames enhanced mean accuracy. On
the other hand, the temporal analysis of well data completed by Brønstad et al. [88] focused
on 3W wells. The work employed ML models, namely RF and PCA. The combination of
RF and PCA achieved an accuracy of 90%. The accuracy of the suggested strategy was over
95% in each of the distinct classes, indicating that it is a valuable way to identify several
anomalous occurrences in well data.
Ben Jabeur et al. [89] used LGBM, CatBoost, XGBoost, RF, and a neural network to
assess a dataset of 2687 samples connected to the temporal characteristics of WTI crude
oil prices. The categorization challenge involved forecasting the movement of numerous
financial indicators in connection to oil prices, including green energy resources, metals
such as gold, silver, petroleum, soybeans, platinum, and copper, the Dollar Index, the
Volatility Index, the Euro, the USD, and the Bitcoin. Accuracy and Area Under the Curve
(AUC) were utilized as the assessment criteria. LGBM and RF fared better than the other
algorithms in the research. The data imply that the suggested strategy is superior to
established methods in forecasting complicated connections. Hassan Baabbad et al. [90]
investigated the prediction of CO2 levels in shale gas reserves, emphasizing non-temporal
factors. The study used ML algorithms like GB, RF, and Multiple Linear Regression (MLR)
on a dataset of 1400 samples with a variety of features such as horizontal wellbore length,
hydraulic fracture length, reservoir length, SRV fracture porosity, SRV fracture permeability,
SRV fracture spacing, total production time, and fracture pressure. The performance
was examined using MSE, and RF outperformed the other ML algorithms. The study
emphasized the usefulness of RF as a superior approach in ML for forecasting CO2 levels
in shale gas reserves compared to the other methods.
The study was evaluated by Alsaihati et al. using RF, ANNs, and Fuzzy Networks
(FNs) on real-time well data with 8983 samples of data [91]. The classification was utilized
to estimate torque and drag using attributes including weight on bit, rotating velocity,
standpipe tension, hook load, and penetration rate. The assessment measures used were
the correlation coefficient (R) and average absolute error percentage (AAPE). Based on the
study, the recommended approach predicted torque and drag during drilling operations
more correctly, and the RF model outperformed the other two models. Next, Kumar and
Hassanzadeh’s [92] work focused on the temporal elements of reservoir modeling utilizing
a 2D STARS simulation. The study’s goal was to forecast the efficacy of shale barriers in the
context of reservoir dynamics, and the ML technique used was RF. The dataset included
240 samples, including predictor factors such as effective formation compressibility, volu-
metric heat capacity, and thermal conductivity for rock, water, oil, and gas. The assessment
measures used were R2 and RMSE, with RF indicating effectiveness. The author offered
enhancements to the proposed technique by including more training data and features,
highlighting the prospect of improving the model’s prediction performance with a larger
dataset and more relevant characteristics.
In addition, Ma et al. [93] completed a non-temporal analysis to forecast burst pressure
in full-scale corroded O&G pipelines. The study utilized RF, XGBoost, SVM, and LGBM.
The dataset included 314 samples with predictor factors such as depth, length, breadth,
wall thickness, pipe diameter, steel grade, and burst pressure. The assessment measures
employed were R2 , RMSE, MAE, and MAPE. XGBoost achieved an R2 of 99% in training
and 98% in testing. The data suggested that the hybrid proposed model, presumably a
blend of two models, attained much higher levels. The research by Canonaco et al. [94]
Sensors 2024, 24, 4013 18 of 57
LGBM outperformed the other ML models, including XGBoost, RF, LR, SVM, NB, the KNN,
and DT, for the classification task concerning fault type identification. F1 score, accuracy,
precision, and recall were among the evaluation measures for model performance, and the
LGBM achieved an accuracy of 87.06%. The study concluded that the model, particularly
the LGBM, demonstrated a high level of competence in fault type classification based on
the DGA data. However, the enhancement of the model’s accuracy is necessary.
The non-temporal analysis study by Tewari et al. [8] focused on drilling operations,
particularly drill bit selection in Norwegian wells. The researchers used several ML models,
including Adaboost, RF, the KNN, NB, MLP, and the SVM. A wide range of drilling-
related features were included in the dataset, including 4312 samples with the following
characteristics: torque, standpipe pressure, mud weight, real vertical depth, weight on bit,
measured dimension, penetration rate, rounds every minute, bit type, bit size, d-exponent,
total flow area, mechanical specific energy, depth of cut, and aggressiveness of the drill bit.
The primary classification focused on drill bit selection, and the RF model demonstrated an
impressive accuracy of 91% in testing and 97% in training. The study’s considerable results
show that the proposed method is more stable, accurate, and dependable than the other
models used in drill bit selection in Norwegian wells.
The research by Santos et al. [99] employed a temporal exploration centered around
well data, specifically focusing on 3W wells. The researcher’s approach involved the
application of an RF model for classification, utilizing a dataset encompassing 1984 samples.
The dataset included crucial parameters such as the gas lift choke pressure, downstream
temperature, and gas lift flow. Their model’s performance was evaluated using metrics like
accuracy, faulty-normal accuracy (FNACC), and real faulty-normal accuracy (RFNACC),
showcasing an impressive accuracy rate of 94%. The study concludes by emphasizing the
efficacy of their proposed method in successfully identifying early faults in the well data.
The hybrid technique, K-Means+RF, performed admirably with R2 values ranging
from 92% to 98%, outperforming various baseline approaches in the study, such as using
the SVM, Local Outlier Factor (LOF), Local Factor, and RF. The study performed a temporal
analysis of reservoir data [100] to cluster sonic (DTC) using the 37 samples from the well
log. The features included depth, gamma ray, shallow resistivity, deep resistivity, neutron,
density, and CALI. Regarding the temporal analysis of well data from the United States,
which has a large field and well-scale, RF was used for clustering barrel of oil equiva-
lent [101]. This experiment used 934 samples, and the features included API, stream date,
surface latitude and longitude, formation thickness, TVD, lateral length, total proppant
mass, total injected fluid volume, API gravity, porosity, permeability, TOC, Vclay, rate of
oil production, gas production, water production, GPI, and frac fluid. Nonetheless, the
research brought attention to the necessity of increasing the accuracy since the RF model’s
testing and training RMSE values were 17.49% and 7.25%, respectively, suggesting potential
overfitting.
The study used various prediction models through temporal research, including
LSTM, AdaBoost, LR, SVR, the DNN, RF, and adaptive RF [102], focusing on crude oil
data. The employment of adaptive RF in the study shows that the model performed with
MAPE, MAE, MSE, RMSE, R2 , and Explained Variance Score (EVS) values of 112.31%, 52%,
53%, 73%, 99%, and 99%, respectively, outperforming other models. Based on the study’s
findings, it’s critical to consider the advantages and disadvantages of the proposed model
because it operates for a longer period than other models used in the study. Another study
employed RF in their experiment to classify the decommissioning options in the O&G
field and utilized 1846 samples from the public O&G dataset [103]. The study was divided
into two types of accuracy, with a comparison between RF, KNNs, NB, DT, and NNs. The
higher accuracies gathered from RF for full and redundant features that were removed
were 80.06% and 80.66%, respectively. However, the suggested approach must be improved
because the accuracy was less than 90%.
Following the non-temporal analysis of well logging data, RF with Analog-to-digital
converters was used for clustering, with 100 samples and features, including neutron
Sensors 2024, 24, 4013 20 of 57
(CNL), gamma ray (GR), density (DEN), and compressional slowness (DTC) [104]. The
study’s RMSE (9%), MAE (6%), MAPE (0.031%), and MSE (86%) values indicate that the
clustering task’s accuracy might be improved. Further, using pipeline data with climate
change components, the study employed the KNN, Multilayer Perceptron Neural Network,
multiclass SVM, and XGBoost model to classify temporal analysis [105]. The features
included temperature, humidity, and wind speed from 81 samples. The XGBoost model’s
accuracy outperformed other models by 92%, leaving room for additional improvement.
Al-Mudhafar et al. [106] worked on well data using LogitBoost, GB, XGBoost, Ad-
aBoost, and the KNN for classification with lithofacies and a well log dataset of 399 samples,
which take into account the following parameters: gamma ray (GR), caliper (CALI), neutron
(NEU), sonic transit time (DT), bulk density (DEN), deep resistivity (RES DEP), shallow
resistivity (RES SLW), total porosity (PHIT), and water saturation (SW). The XGBoost model
performed admirably, surpassing other techniques with a Total Percent Correct (TPC) accu-
racy measures of 97%. Subsequently, Wen et al.’s [107] study on a non-temporal pipeline
dataset used recursive feature elimination and particle swarm optimization-AdaBoost
for clustering. The collection included 3986 samples with information about landslide
risk and long-distance pipelines and consisted of a few parameters, which were landslide
susceptibility area (km2 ), percentage (%), and historical landslides (number). The model
attained 90% accuracy during training and 83% accuracy during testing, indicating that the
proposed clustering strategy must be improved in terms of accuracy.
In the research from Otchere et al.’s study [106,108], which focuses on analysis in the
reservoir domain, specifically using the non-temporal Equinor Volve Field datasets, two
models employed Bayesian Optimization with XGBoost (BayesOpt-XGBoost) and XGBoost.
The dataset comprised 2853 samples, and the classification task involved DT, GR, NPHI, RT,
and RHOB as features, aiming to predict Vshale, porosity, and water saturation (Sw). The
evaluation metrics encompassed RMSE and MAE. The BayesOpt-XGBoost model achieved
an overall accuracy of 93%, with a precision of 98%, a recall of 86%, and a combined F1
score of 93%. Despite these encouraging outcomes, the research indicates that there may be
room for improvement in the model’s performance as the suggested approach may not be
reliable enough to forecast every output variable. Lastly, a study in the temporal drilling
analysis, which used RF and DT, emphasized the need for data confidentiality [109]. The
prediction task used weight on drill string rotation speed, rate of penetration, and pump
rate as secret features to forecast rock porosity. The RF model performed exceptionally well,
with an accuracy of 99% in training and 90% in testing, demonstrating its durability and
dependability in handling sensitive drilling data. The literature on the use of DT, RF, and
hybrid models is compiled in Table 4.
Sensors 2024, 24, 4013 21 of 57
Table 4. Summary of the literature on the application of decision tree, random forest, and hybrid models.
Table 4. Cont.
Table 4. Cont.
Table 4. Cont.
Table 4. Cont.
Table 4. Cont.
Table 4. Cont.
the others exceptionally, with an R2 of 99%, RMSE of 0.0099, MSE of 9.84 × 10−5 , MAE of
0.008, RSE of 0.001, and EVS of 0.955.
The model used in a study by Yuan et al. [119] were Gradient Boosting DT, Physics-
Based Bayesian Linear Regression (PBBLR), Bayesian Linear Regression (BLR), and ANN
with the usage of non-temporal pipeline domain. With 728 samples from the Supervisory
Control and Data Acquisition (SCADA) system, the models attempted to predict factors
such as the original length of mixed oil, transportation distance, diameter, and Reynolds
number. Although PBBLR is regarded as a superior method, the assessment metrics,
i.e., RMSE, MAE, and R2 , indicate that the accuracy should be improved. The proposed
model could benefit from additional improvements. These collective studies showcase
the versatile applications of AI models in addressing crucial challenges within the O&G
industry, encompassing diverse aspects such as predicting pipeline corrosion, gas well
parameters, natural gas pipeline failures, and O&G production outcomes. Incorporating
innovative optimization techniques underscores the industry’s commitment to harnessing
advanced technologies for enhanced operational efficiency and robust risk management
strategies. Table 5 contains previous research published on interrelated AI models for
predictive analytics in the O&G field.
Table 5. Previous research published on interrelated AI models for predictive analytics in the O&G field.
Table 5. Cont.
Liu et al. [122] delved into the application of seasonal autoregressive SARIMA, LSTM,
and autoregressive (AR) models. The researcher focused on transformer using DGA dataset
consisted of 610 samples, considering parameters like H2 , CH4 , C2 H4 , C2 H6 , CO, CO2 , and
total hydrocarbon (TH) to predict dissolved gas concentrations. The evaluation metric, i.e.,
the Accuracy Relative Error (ARE), highlighted the SARIMA model’s efficacy in capturing
seasonal variations and long-term dependencies within the transformer DGA dataset.
Yang et al. [62] extended the exploration of statistical methods in wells, employing LSTM
and ARIMA models. Concentrating on the Longmaxi Formation of the Sichuan Basin
with 3650 data samples, they used date and daily production data to forecast shale gas
production. The evaluation metrics, including MAE, RMSE, and R2 , demonstrated the
effectiveness of LSTM in capturing temporal dependencies and ARIMA in handling time
series forecasting tasks. However, the model’s accuracy was 63% and needs improvement.
Moreover, Xuemei Li et al. [123] contributed to the field of statistical methods, specifically
examining the Grey Model (GM), Fractional Grey Model (FGM), Data Grouping-Based
Grey Modeling Method (DGGM), ARIMA, PSO for Grey Model (PSOGM), and PSO-based
data grouping grey model with fractional order accumulation (PSO-FDGGM). Their study,
focusing on natural gas in China, aimed to predict natural gas production during training.
MAPE served as the evaluation metric, with PSO-FDGGM showcasing its effectiveness in
optimizing the statistical models for accurate predictions, with the result of MAPE is 3.19%.
The model’s performance is noteworthy and reliable.
Collectively, these studies underscore the diverse applications of statistical methods in
predictive analytics for the O&G sector. The SARIMA, LSTM, ARIMA, GM, FGM, DGGM,
AR, PSOGM, and PSO-FDGGM are recognized as effective tools for handling temporal
dependencies, forecasting production, and optimizing model parameters. The specifics
of the data and the nature of the predictive analytics work determine which statistical
approaches are best, highlighting the need for a customized strategy in the O&G sector.
Table 6 highlights previous studies on a statistical model for predictive analytics modeling
in the O&G field.
Table 6. Previous studies on statistical models for predictive analytics modeling in the O&G field.
Furthermore, Chung et al. [126] investigated PCA, SVM, and LDA for temporal
predictions in oil. Their study utilized real-time oil samples, where the pore size (R)
remained constant, and the capillary flow rate (l2/t) was a function of interfacial properties
(γLG and θ) and viscosity (µ) to predict oil types and 30 samples from real-time oil samples.
The evaluation metric used was accuracy, emphasizing the capability of the SVM to capture
the underlying patterns in the temporal dataset, with an accuracy predicted of 90%. In the
experiment by Mohamadian et al. [127], the analysis focused on a non-temporal well-log
dataset from three drilled wellbores. The researchers employed ML models, specifically
Multilayer Perceptron with PSO (MLP-PSO) and Multilayer Perceptron with GA (MLP-GA),
for the prediction task involving variables such as depth, compressional wave velocity (Vp),
shear wave velocity (Vs), bulk density (ρ), and pressure pore (Pp), with the target being the
probable depth of casing collapse. The dataset included 22,323 samples, and the evaluation
metrics comprised R2 and RMSE. The performance of the proposed method indicates that
the accuracy of the MLP-PSO model outperformed that of the other models.
Next, the research by Sabah et al. [128] concentrated on drilling activity utilizing non-
temporal data from 305 wells drilled and located in the Marun oil field. The researchers
tested several ML models, including the hybridization of the Least Square Support Vector
Machine (LSSVM) with COA, PSO, and GA, MLP-COA, MLP-PSO, MLP-GA, LSSVM, and
MLP, to predict parameters such as northing, easting, depth, meterage, time of drilling,
formation type, size of hole, weight on bit, flow rate, weight of mud, MFVIS, retort solid,
pore pressure, fracture pressure, fan 600/fan 300, Gel 10min/Gel 10s, pump pressure, and
rpm. The goal variable was the severity of mud loss. The MLP-GA model had an RMSE
of 93%, while the suggested model was accurate. Shi et al. [129] used a Hybrid-Physics
Guided-Variational Bayesian Spatial-Temporal Neural Network to analyze natural gas
across time. The study aimed to forecast natural gas concentrations using a dataset of
600 samples. The predictor variables were geometry size, release point position, release
diameter, released gas, volumetric release rate, duration, and sensor placement. The
R2 value was used as an evaluation metric, and the Hybrid-Physics Guided-Variational
Bayesian Spatial-Temporal Neural Network received a score of R2 is 99% It can be concluded
that the findings imply the Hybrid-Physics Guided-Variational Bayesian Spatial-Temporal
Neural Network enhanced the spatiotemporal forecasting performance.
Furthermore, the temporal analysis focused on well data, specifically within the
context of 3W wells by Machado et al. [130]. The research involved the application of LSTM
and One-Class Support Vector Machine (OCSVM) models for classification, utilizing a
dataset comprising 1984 samples. The classification task aimed to identify the following
types of faults: P-PDG, P-TPT, T-TPT, P-MON-CKP, and T-JUS-CKP. The evaluation metrics
included recall, specificity, and accuracy, with the OCSVM model achieving an accuracy
of 91%. The study found that feature selection did not improve classifier accuracy, and
the proposed model demonstrated a lack of robustness in effectively classifying the two
types of faults in the well data. The temporal analysis of the research by Carvalho et al. [10]
focused on well data, specifically 3W wells. The study used ML models such as Ordered
Nearest Neighbors, Weighted Nearest Neighbors, LDA, and QDA to perform a classification
job with 1984 samples. The classification sought to forecast flow instability by detecting
events like P-PDG, P-TPT, T-TPT, P-MON-CKP, T-JUS-CKP, and CLASS. The evaluation
measures included recall, specificity, and accuracy, with the ONN reaching an accuracy
of 81%. However, the study’s author recommended looking into different metaheuristic
methodologies, indicating a possibility for better performance in forecasting flow instability
from the well data.
In the study by Zhou et al. [131], the analysis in the reservoir domain was conducted
with DT and SVM models on high-resolution non-temporal Formation Micro-Imager (FMI)
data. The classification task aimed to categorize how logging units react to sedimentary
pyroclastic rock, regular pyroclastic rock, and pyroclastic lava for lithologically classifying
pyroclastic rocks. The SVM’s model had an impressive accuracy of 98.6%, surpassing
the threshold of 95%. The study emphasized the efficacy of the suggested model in
Sensors 2024, 24, 4013 35 of 57
Table 7. Previous works on the application of ML models for predictive analytics modeling in O&G fields.
Table 7. Cont.
Table 7. Cont.
Table 7. Cont.
accuracies, the use of 15 variables produces superior outcomes than the five variable
models. Previous research publications may be found in Table 9.
• Table 10 summarizes the input parameters for a well logging predictive analytics
model. The researchers commonly used 14 parameters for well logging, including
gamma ray (GR), sonic (Vp), deep and shallow resistivities (LLD and LLS), neuro-
porosity (NPHI), density (RHOB), caliper (CALI), neutron (NEU), sonic transit time
(DT), bulk density (DEN), deep resistivity (RD), true resistivity (RT), shallow resistivity
(RES SLW), total porosity (PHIT), and water saturation (SW). The correlation coeffi-
cient between the input parameters and the target variables is essential to determine
which parameters are appropriate for predictive analytics and the data type, which
may be numerical or categorical. Thus, a few important variables can be chosen to con-
struct the best model for increased accuracy. However, the model using 14 variables
produced a substantial result of 97% by including XGBoost in their research, but the
study that only utilized GR, Vp, LLD and LLS, NPHI, and RHOB and used the LSTM
model achieved a slightly lower result of 94%. These three well-known datasets, which
have been utilized in recent research in the O&G sector, demonstrate the importance
of determining the correlation between target and input parameters to compare which
variables are appropriate for models to provide significant outcomes in the research.
• The assessment of O&G research revealed an increase in published papers over time.
As seen in Figure 2, the rise in O&G discoveries due to the dependence of technological
advancements on the usage of gas and petroleum, as well as the annual progress of ML
and AI tools, has resulted in more studies in this field utilizing AI-based models. As
shown in Figure 2, there was an increase in growth throughout 2021, with 32 research
publications published in this field. However, the number of articles released in 2022
decreased by seven, with just 25 published research papers. This reduction can be
attributed to the continued development of AI and the gradual progression of interest
in O&G research. It exhibits a positive trend, with 34 articles published in this field by
2023. This increase may be impacted by recognizing the necessity for improvement in
the AI-based model in the O&G area. Many O&G companies have followed the IR4.0
road to integrate AI in their organization and reduce the likelihood of future expense
utilization by forecasting future events.
• Throughout the research period, developments in AI models resulted in more com-
plicated and interconnected models, giving researchers tools to construct more exact
and resilient models. A similar finding was reached while investigating the use of
various models in predictive analytics in the O&G industry during the last three
years. Figure 4a depicts a thorough breakdown of the most common model types
used for predictive analytics in the O&G industry, illustrated by a pie chart. The chart
shows that the most widely used models, there is 37% out of all models are classified
as “others”, which primarily include foundational models such as SVR, GRU, MLP,
and boosting-based models (shown in Figure 4b). Due to their improved efficiency,
accuracy, and capacity to handle non-linear datasets, these models have become quite
popular. This selection of models shows that there is still a lot of remaining potential
in this field.
• The analysis of predictive analytics research publications from 2021 to 2023 focuses
heavily on several areas of the O&G sector. Crude oils (7), oil (5), reservoirs (16),
pipelines (16), drilling (5), wells (20), transformers (10), gas (10), and lithology (2)
all appear as similar subjects in different research. The frequency of these terms
demonstrates the industry’s strong interest in using predictive analytics to optimize
operations and decision-making in various sectors, including reservoir management,
drilling procedures, pipeline integrity, and transformer health. This trend represents
a deliberate effort in the O&G industry to use sophisticated analytics for greater effi-
ciency, risk management, and overall operational excellence. Figure 5 is the graphical
summary of the types of O&G sectors in research articles.
Sensors 2024, 24, 4013 43 of 57
and RF models are very variable, with some obtaining outstanding accuracy and others
doing poorly. Interrelated AI models have consistently obtained excellent accuracy.
Statistical models, such as the ARIMA, perform poorly compared to other categories,
showing their limits with complicated datasets. Predictive analytics models normally
perform well. Yet, there is a significant outlier in predictive analytics modeling. For
example, K+MC with 18% accuracy.
• Performance levels differ among model categories, as shown in Figure 7. ANN models
perform well on average, with an accuracy of 89.23%, but performance can vary
greatly depending on specific variations and modifications, as shown by several
outliers. DL models perform well, with an average accuracy of 93.73%, demonstrating
less variability and solid outcomes across diverse versions. Fuzzy Logic and Neuro-
fuzzy models stand out for their excellent and constant performance, with an average
accuracy of 99%, making them extremely trustworthy for their applications. DT,
RF, and hybrid models exhibit great variability; although models like CATBOOST
and DT attain excellent accuracy, others, such as RF+Analog-to-digital converters,
perform poorly. Interrelated AI models perform consistently well, with an average
accuracy of 97.67%. In comparison, the ARIMA model from the statistical model
category performs inadequately, with 63% accuracy, demonstrating limits in dealing
with complex information. Models used for predictive analytics in the O&G field
typically perform well, although there are a few distinct instances. Overall, while
the most advanced AI models perform well, the diversity in particular categories
emphasize the significance of model selection and modification for the best outcomes.
Input Parameter of
[86] [99] [22] [73] [130] [87] [88] [10] [85] [135]
Undesirable Well Events
√ √ √ √ √ √ √ √ √ √
P-PDG
√ √ √ √ √ √ √ √ √
P-TPT
√ √ √ √ √ √ √ √ √
T-TPT
√ √ √ √ √ √ √ √
P-MON-CKP
√ √ √ √ √ √ √
T-JUS-CKP
√ √ √
T-JUS-CKGL
√ √ √
P-JUS-CKGL
√
P-CKGL
√ √ √ √
QGL
√
T-PDG
√ √
T-PCK
Table 9. Input parameters for the fault detection of transformer oil from the DGA dataset.
Input Parameter of
Internal Transformer [35] [122] [40] [83] [20] [98] [59] [139] [65] [111]
Defects
√ √ √ √ √ √ √ √
Acetylene (C2 H2 )
√ √ √ √ √ √ √ √ √
Ethylene (C2 H4 )
√ √ √ √ √ √ √ √ √
Ethane (C2 H6 )
√ √ √ √ √ √ √ √ √
Methane (CH4 )
√ √ √ √ √ √ √ √
Hydrogen (H2 )
Sensors 2024, 24, 4013 45 of 57
Table 9. Cont.
Input Parameter of
Internal Transformer [35] [122] [40] [83] [20] [98] [59] [139] [65] [111]
Defects
√
Total Hydrocarbon (TH)
√ √ √ √ √
Carbon Monoxide (CO)
√ √ √ √ √
Carbon Dioxide (CO2 )
√
Ammonia (NH3 )
√
Acetaldehyde (CH3 CHO)
√
Acetone (CH32 CO)
√
Nitrogen
Sensors 2024, (N
24,2x) FOR PEER REVIEW 47 of 60
√
Ethanol (CH3 CH2 OH)
(a)
(b)
Figure
Figure 4.4. Preferred
PreferredAIAImodel
modeltypes
typesin
inthe
theresearch
research articles
articlesabout
about predictive
predictiveanalytics
analyticsin
in the
the O&G
O&G
field: (a) overview of the AI models used in the publications and (b) extended “others” section.
field: (a) overview of the AI models used in the publications and (b) extended “others” section.
• The analysis of predictive analytics research publications from 2021 to 2023 focuses
heavily on several areas of the O&G sector. Crude oils (7), oil (5), reservoirs (16),
pipelines (16), drilling (5), wells (20), transformers (10), gas (10), and lithology (2) all
appear as similar subjects in different research. The frequency of these terms demon-
strates the industry’s strong interest in using predictive analytics to optimize opera-
tions and decision-making in various sectors, including reservoir management, drill-
ing procedures, pipeline integrity, and transformer health. This trend represents a
deliberate effort in the O&G industry to use sophisticated analytics for greater effi-
Sensors 2024, 24, 4013 46 of 57
Input Parameter of Well Logging [64] [106] [104] [140] [100] [108]
√ √ √ √ √ √
Gamma Ray (GR)
√ √
Sonic (Vp)
Deep and Shallow Resistivities (LLD √ √
and LLS)
Sensors 2024, 24, x FOR PEER REVIEW √ 48 of 60 √
Neuro-porosity (NPHI)
√ √ √ √
Density (RHOB)
√ √ √
Caliper (CALI)
Type of O&G Sectors In Research Articles (2021–2023)
√ √ √
Neutron (NEU)
25 √ √ √ √
Sonic Transit Time (DT) 20
20 16 16 √ √
Frequency
many15 model characteristics. Figure 6a, which shows the various performance
10 10
measures
10 used
7 in the research, demonstrates that accuracy (49) was the most pre-
5 the correctly predicted
ferred for calculating 5 value versus the actual one. This perfor-
5 2
mance measure is appropriate for categorical data types and classification predictive
0
analysis because it is simple to grasp and indicates whether all the classes are bal-
anced. However, utilizing accuracy for unbalanced classes has limitations since it can
be deceptive; alternative measures like precision, recall, F1 score, or AUC may be
more helpful. Aside from that, the researchers’ second chosen performance indicator
in their research is R2 (41). This performance indicator is commonly employed in re-
gression analysis and numerical Types of Oil
data and
since Gas Sectors
it measures the relationship between the
independent and dependent variables.
Figure 5. Types of O&G sectors in research articles from 2021 to 2023.
Figure 5. Types of O&G sectors in research articles from 2021 to 2023.
• Preferred
Several Performance
performance measuresMetrics
have been utilized in O&G research, demonstrating
by Researcher
diverse assessment criteria for predictive analytics models (see Figure 6). The perfor-
mance metrics help understand the models’ performance since they might show
60
many 49 model characteristics. Figure 6a, which shows the various performance
50
measures used in the research, demonstrates that accuracy (49) was the most pre-
41 40
ferred for calculating the correctly predicted value versus the actual one. This perfor-
Frequency
40
mance measure 32is appropriate for categorical data types and classification predictive
28 27
30
analysis because 26 26
it is simple to24grasp and26
indicates whether all the classes are bal-
anced. However, 18
utilizing accuracy for unbalanced classes has limitations since it can
20
be deceptive; alternative measures like precision, recall, F1 score, or AUC may be
10 helpful. Aside from that, the3 researchers’
more 2 3 second chosen performance indicator
in their
0 research is R2 (41). This performance indicator is commonly employed in re-
gression analysis and numerical data since it measures the relationship between the
independent and dependent variables.
Performance Metrices
Preferred Performance Metrics
by Researcher
(a) (b)
60
Figure 6. Preferred performance metrics by the researcher: (a) combination of performance metrics
Figure 6. 49 Preferred performance metrics by the researcher: (a) combination of performance metrics
used in publications. (b) All additional performance metrics displayed
50
used in publications.
41 40 (b) All additional performance metrics displayed.
Frequency
40
32
26 26 28 27 26
30 24
18
20
10 3 2 3
0
Sensors 2024, 24, 4013 47 of 57
Table 11. A summary of each ML method’s accuracy for predictive analytics in the O&G industry
from previous studies.
Percentage (%)
60.00%
50.00%
40.00%
30.00%
20.00%
10.00%
0.00%
Figure7.7.Average
Figure Averageaccuracy
accuracyofofML
MLmodels
modelsininthe
theO&G
O&Gindustry.
industry.
4.4.Future
FutureResearch
ResearchDirections
Directions
As
Aspredictive
predictiveanalytics
analyticsininthe
theO&G
O&Gindustry
industrycontinues
continuestotoevolve,
evolve,several
severalavenues
avenuesfor
for
future
futureresearch
researchand
anddevelopment
developmentemerge.emerge.First,
First,exploring
exploringthe
theintegration
integrationofofadvanced
advanced
Deep
DeepLearning
Learningtechniques,
techniques,such suchasasRNN
RNNand andLSTM
LSTMnetworks,
networks,could
couldenhance
enhancethe thetemporal
temporal
predictive capabilities of existing models. These architectures are adept at
predictive capabilities of existing models. These architectures are adept at capturing capturing sequen-
se-
tial dependencies and time series patterns, which could prove invaluable
quential dependencies and time series patterns, which could prove invaluable for fore- for forecasting
dynamic aspects like
casting dynamic O&Glike
aspects production rates or pipeline
O&G production conditions.
rates or pipeline Second, Second,
conditions. investigating
inves-
explainability and interpretability
tigating explainability in complexinmodels,
and interpretability complex such as ensemble
models, such astechniques
ensemble and tech-
Deep
niques Learning
and Deepnetworks,
Learning continues to be
networks, an important
continues to be area of research.
an important Developing
area of research.meth-
De-
ods to elucidate the decision-making processes of these models can enhance
veloping methods to elucidate the decision-making processes of these models can enhance the trust and
acceptance
the trust and of predictive
acceptance analytics in decision
of predictive support
analytics systems support
in decision within the O&G domain.
systems within the
Furthermore,
O&G domain. there is potential for extending research into the optimization of hybrid
models, focusing on refining parameter-tuning strategies and evaluating the
Furthermore, there is potential for extending research into the optimization of hybrid robustness
ofmodels,
these approaches
focusing onacross diverse
refining datasets and strategies
parameter-tuning scenarios. and
For evaluating
instance, understanding
the robustness
how QPSO or FDGGM parameters impact model performance could lead
of these approaches across diverse datasets and scenarios. For instance, understanding to more effective
and
howefficient
QPSO or hybrid
FDGGM predictive systems.
parameters Additionally,
impact exploringcould
model performance predictive
lead to analytics for
more effec-
emerging challenges in the industry, such as sustainability, environmental
tive and efficient hybrid predictive systems. Additionally, exploring predictive analytics impact, and
safety, could open new avenues for research. Predicting the environmental consequences
for emerging challenges in the industry, such as sustainability, environmental impact, and
of O&G activities or developing models for proactive safety monitoring could contribute
safety, could open new avenues for research. Predicting the environmental consequences
significantly to the industry’s responsible and sustainable practices.
of O&G activities or developing models for proactive safety monitoring could contribute
Finally, comprehensive benchmarking studies are needed to compare the performance
significantly to the industry’s responsible and sustainable practices.
of various predictive models under many circumstances and datasets. This could facilitate
Finally, comprehensive benchmarking studies are needed to compare the perfor-
the identification of the most suitable models for specific applications within the O&G sector,
mance of various predictive models under many circumstances and datasets. This could
providing practitioners with insightful information for making decisions. In conclusion,
facilitate the identification of the most suitable models for specific applications within the
future research in predictive analytics for the O&G industry should delve into advanced
O&G sector, providing practitioners with insightful information for making decisions. In
Deep Learning architectures, enhance model interpretability, optimize hybrid approaches,
conclusion, future research in predictive analytics for the O&G industry should delve into
advanced Deep Learning architectures, enhance model interpretability, optimize hybrid
Sensors 2024, 24, 4013 50 of 57
5. Conclusions
This review aimed to provide a thorough overview of the utilization of ML models in
simulating predictive analytics within the O&G sectors. From 2021 to 2023, we collected
data from respectable journals indexed in Web of Science, Science Direct, Scopus, and
IEEE. The analysis revealed that seven iterations of ML models had been employed in
predictive analytics modeling for the O&G industry. The survey identified key components
within existing predictive analytics models for the O&G field, encompassing Key elements
of current predictive analytics models for the oil and gas industry were identified by
the survey. These elements included model types, temporal aspects of the data and the
field, the name of the data, dataset types, predictive analytics methodologies (such as
classification, clustering, or prediction), model input and output parameters, performance
metrics, optimal models, and the advantages and disadvantages of the models. Rigorous
scientific assessments and evaluations were conducted on the surveyed studies, leading to
detailed discussions on numerous findings. This review also highlights various potential
future research directions based on the current state of the literature, providing insightful
information to interested professionals in this sector.
Author Contributions: P.A.R.A., writing—original draft preparation and visualization; M.Y., review
and editing and supervision; and M.T.M.S.-d., funding acquisition. All authors have read and agreed
to the published version of the manuscript.
Funding: This research was funded by Petronas Research Sdn. Bhd. (PRSB), grant number
20220801012.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: This study did not report any data.
Conflicts of Interest: The authors declare no conflicts of interest.
Abbreviations
References
1. Liang, J.; Li, C.; Sun, K.; Zhang, S.; Wang, S.; Xiang, J.; Hu, S.; Wang, Y.; Hu, X. Activation of mixed sawdust and spirulina with
or without a pre-carbonization step: Probing roles of volatile-char interaction on evolution of pyrolytic products. Fuel Process.
Technol. 2023, 250, 107926. [CrossRef]
2. Xu, L.; Wang, Y.; Mo, L.; Tang, Y.; Wang, F.; Li, C. The research progress and prospect of data mining methods on corrosion
prediction of oil and gas pipelines. Eng. Fail. Anal. 2023, 144, 106951. [CrossRef]
3. Yusoff, M.; Ehsan, D.; Sharif, M.Y.; Sallehud-Din, M.T.M. Topology Approach for Crude Oil Price Forecasting of Particle Swarm
Optimization and Long Short-Term Memory. Int. J. Adv. Comput. Sci. Appl. 2024, 15, 524–532. [CrossRef]
4. Yusoff, M.; Sharif, M.Y.; Sallehud-Din, M.T.M. Long Term Short Memory with Particle Swarm Optimization for Crude Oil Price
Prediction. In Proceedings of the 2023 7th International Symposium on Innovative Approaches in Smart Technologies (ISAS),
Istanbul, Turkiye, 23–25 November 2023; pp. 1–4. [CrossRef]
5. Sharma, R.; Villányi, B. Evaluation of corporate requirements for smart manufacturing systems using predictive analytics. Internet
Things 2022, 19, 100554. [CrossRef]
6. Mahfuz, N.M.; Yusoff, M.; Ahmad, Z. Review of single clustering methods. IAES Int. J. Artif. Intell. 2019, 8, 221–227. [CrossRef]
7. Henrys, K. Role of Predictive Analytics in Business. SSRN Electron. J. 2021. [CrossRef]
8. Tewari, S.; Dwivedi, U.D.; Biswas, S. A novel application of ensemble methods with data resampling techniques for drill bit
selection in the oil and gas industry. Energies 2021, 14, 432. [CrossRef]
9. Allouche, I.; Zheng, Q.; Yoosef-Ghodsi, N.; Fowler, M.; Li, Y.; Adeeb, S. Enhanced predictive method for pipeline strain demand
subject to permanent ground displacements with internal pressure & temperature: A finite difference approach. J. Infrastruct.
Intell. Resil. 2023, 2, 100030. [CrossRef]
10. Carvalho, B.G.; Vargas, R.E.V.; Salgado, R.M.; Munaro, C.J.; Varejao, F.M. Flow Instability Detection in Offshore Oil Wells with
Multivariate Time Series Machine Learning Classifiers. In Proceedings of the 2021 IEEE 30th International Symposium on
Industrial Electronics (ISIE), Kyoto, Japan, 20–23 June 2021; pp. 1–6. [CrossRef]
11. Ohalete, N.C.; Aderibigbe, A.O.; Ani, E.C.; Ohenhen, P.E.; Akinoso, A. Advancements in predictive maintenance in the oil and
gas industry: A review of AI and data science applications. World J. Adv. Res. Rev. 2023, 20, 167–181. [CrossRef]
12. Tariq, Z.; Aljawad, M.S.; Hasan, A.; Murtaza, M.; Mohammed, E.; El-Husseiny, A.; Alarifi, S.A.; Mahmoud, M.; Abdulraheem, A.
A Systematic Review of Data Science and Machine Learning Applications to the Oil and Gas Industry. J. Pet. Explor. Prod. Technol.
2021, 11, 4339–4374. [CrossRef]
13. Yu, X.; Wang, J.; Hong, Q.-Q.; Teku, R.; Wang, S.-H.; Zhang, Y.-D. Transfer learning for medical images analyses: A survey.
Neurocomputing 2022, 489, 230–254. [CrossRef]
14. Barkana, B.D.; Ozkan, Y.; Badara, J.A. Analysis of working memory from EEG signals under different emotional states. Biomed.
Signal Process. Control. 2022, 71, 103249. [CrossRef]
15. Chen, W.; Huang, H.; Huang, J.; Wang, K.; Qin, H.; Wong, K.K. Deep learning-based medical image segmentation of the aorta
using XR-MSF-U-Net. Comput. Methods Programs Biomed. 2022, 225, 107073. [CrossRef] [PubMed]
16. Huang, C.; Gu, B.; Chen, Y.; Tan, X.; Feng, L. Energy return on energy, carbon, and water investment in oil and gas resource
extraction: Methods and applications to the Daqing and Shengli oilfields. Energy Policy 2019, 134, 110979. [CrossRef]
17. Hazboun, S.; Boudet, H. Chapter 8—A ‘thin green line’ of resistance? Assessing public views on oil, natural gas, and coal export
in the Pacific Northwest region of the United States and Canada. In Public Responses to Fossil Fuel Export; Boudet, H., Hazboun, S.,
Eds.; Elsevier: Amsterdam, The Netherlands, 2022; pp. 121–139.
18. Champeecharoensuk, A.; Dhakal, S.; Chollacoop, N.; Phdungsilp, A. Greenhouse gas emissions trends and drivers insights from
the domestic aviation in Thailand. Heliyon 2024, 10, e24206. [CrossRef] [PubMed]
19. Centobelli, P.; Cerchione, R.; Del Vecchio, P.; Oropallo, E.; Secundo, G. Blockchain technology for bridging trust, traceability and
transparency in circular supply chain. Inf. Manag. 2022, 59, 103508. [CrossRef]
20. Majed, H.; Al-Janabi, S.; Mahmood, S. Data Science for Genomics (GSK-XGBoost) for Prediction Six Types of Gas Based on
Intelligent Analytics. In Proceedings of the 2022 22nd International Conference on Computational Science and Its Applications
(ICCSA), Malaga, Spain, 4–7 July 2022; pp. 28–34. [CrossRef]
21. Waterworth, A.; Bradshaw, M.J. Unconventional trade-offs? National oil companies, foreign investment and oil and gas
development in Argentina and Brazil. Energy Policy 2018, 122, 7–16. [CrossRef]
Sensors 2024, 24, 4013 53 of 57
22. Marins, M.A.; Barros, B.D.; Santos, I.H.; Barrionuevo, D.C.; Vargas, R.E.; de M. Prego, T.; de Lima, A.A.; de Campos, M.L.; da
Silva, E.A.; Netto, S.L. Fault detection and classification in oil wells and production/service lines using random forest. J. Pet. Sci.
Eng. 2020, 197, 107879. [CrossRef]
23. Dhaked, D.K.; Dadhich, S.; Birla, D. Power output forecasting of solar photovoltaic plant using LSTM. Green Energy Intell. Transp.
2023, 2, 100113. [CrossRef]
24. Yan, R.; Wang, S.; Peng, C. An Artificial Intelligence Model Considering Data Imbalance for Ship Selection in Port State Control
Based on Detention Probabilities. J. Comput. Sci. 2021, 48, 101257. [CrossRef]
25. Agwu, O.E.; Okoro, E.E.; Sanni, S.E. Modelling oil and gas flow rate through chokes: A critical review of extant models. J. Pet. Sci.
Eng. 2022, 208, 109775. [CrossRef]
26. Nandhini, K.; Tamilpavai, G. Hybrid CNN-LSTM and modified wild horse herd Model-based prediction of genome sequences
for genetic disorders. Biomed. Signal Process. Control. 2022, 78, 103840. [CrossRef]
27. Balaji, S.; Karthik, S. Deep Learning Based Energy Consumption Prediction on Internet of Things Environment. Intell. Autom. Soft
Comput. 2023, 37, 727–743. [CrossRef]
28. Yang, H.; Liu, X.; Chu, X.; Xie, B.; Zhu, G.; Li, H.; Yang, J. Optimization of tight gas reservoir fracturing parameters via gradient
boosting regression modeling. Heliyon 2024, 10, e27015. [CrossRef] [PubMed]
29. de los Ángeles Sánchez Morales, M.; Anguiano, F.I.S. Data science—Time series analysis of oil & gas production in mexican fields.
Procedia Comput. Sci. 2022, 200, 21–30. [CrossRef]
30. Tan, Y.; Al-Huqail, A.A.; Chen, Q.; Majdi, H.S.; Algethami, J.S.; Ali, H.E. Analysis of groundwater pollution in a petroleum
refinery energy contributed in rock mechanics through ANFIS-AHP. Int. J. Energy Res. 2022, 46, 20928–20938. [CrossRef]
31. Wu, M.; Wang, G.; Liu, H. Research on Transformer Fault Diagnosis Based on SMOTE and Random Forest. In Proceedings of
the 2022 4th International Conference on Electrical Engineering and Control Technologies (CEECT), Shanghai, China, 16–18
December 2022; pp. 359–363. [CrossRef]
32. Dashti, Q.; Matar, S.; Abdulrazzaq, H.; Al-Shammari, N.; Franco, F.; Haryanto, E.; Zhang, M.Q.; Prakash, R.; Bolanos, N.; Ibrahim,
M.; et al. Data Analytics into Hydraulic Modelling for Better Understanding of Well/Surface Network Limits, Proactively Identify
Challenges and, Provide Solutions for Improved System Performance in the Greater Burgan Field. In Proceedings of the Abu
Dhabi International Petroleum Exhibition & Conference, Abu Dhabi, United Arab Emirates, 15–18 November 2021. [CrossRef]
33. Wang, X.; Daryapour, M.; Shahrabadi, A.; Pirasteh, S.; Razavirad, F. Artificial neural networks in predicting of the gas molecular
diffusion coefficient. Chem. Eng. Res. Des. 2023, 200, 407–418. [CrossRef]
34. Kamarudin, R.; Ang, Y.; Topare, N.; Ismail, M.; Mustafa, K.; Gunnasegaran, P.; Abdullah, M.; Mazlan, N.; Badruddin, I.; Zedan, A.;
et al. Influence of oxyhydrogen gas retrofit into two-stroke engine on emissions and exhaust gas temperature variations. Heliyon
2024, 10, e26597. [CrossRef] [PubMed]
35. Raghuraman, R.; Darvishi, A. Detecting Transformer Fault Types from Dissolved Gas Analysis Data Using Machine Learning
Techniques. In Proceedings of the 2022 IEEE 15th Dallas Circuit and System Conference (DCAS), Dallas, TX, USA, 17–19 June
2022; pp. 1–5. [CrossRef]
36. Mukherjee, T.; Burgett, T.; Ghanchi, T.; Donegan, C.; Ward, T. Predicting Gas Production Using Machine Learning Methods: A
Case Study. In Proceedings of the SEG International Exposition and Annual Meeting, San Antonio, TX, USA, 25 September 2019;
pp. 2248–2252. [CrossRef]
37. Dixit, N.; McColgan, P.; Kusler, K. Machine Learning-Based Probabilistic Lithofacies Prediction from Conventional Well Logs: A
Case from the Umiat Oil Field of Alaska. Energies 2020, 13, 4862. [CrossRef]
38. Aldosari, H.; Elfouly, R.; Ammar, R. Evaluation of Machine Learning-Based Regression Techniques for Prediction of Oil and Gas
Pipelines Defect. In Proceedings of the 2020 International Conference on Computational Science and Computational Intelligence
(CSCI), Las Vegas, NV, USA, 16–18 December 2020; pp. 1452–1456. [CrossRef]
39. Elmousalami, H.H.; Elaskary, M. Drilling stuck pipe classification and mitigation in the Gulf of Suez oil fields using artificial
intelligence. J. Pet. Explor. Prod. Technol. 2020, 10, 2055–2068. [CrossRef]
40. Taha, I.B.; Mansour, D.-E.A. Novel Power Transformer Fault Diagnosis Using Optimized Machine Learning Methods. Intell.
Autom. Soft Comput. 2021, 28, 739–752. [CrossRef]
41. Tiyasha; Tung, T.M.; Yaseen, Z.M. A survey on river water quality modelling using artificial intelligence models: 2000–2020. J.
Hydrol. 2020, 585, 124670. [CrossRef]
42. Agatonovic-Kustrin, S.; Beresford, R. Basic concepts of artificial neural network (ANN) modeling and its application in pharma-
ceutical research. J. Pharm. Biomed. Anal. 2000, 22, 717–727. [CrossRef] [PubMed]
43. Tao, H.; Hameed, M.M.; Marhoon, H.A.; Zounemat-Kermani, M.; Heddam, S.; Kim, S.; Sulaiman, S.O.; Tan, M.L.; Sa’adi, Z.; Mehr,
A.D.; et al. Groundwater level prediction using machine learning models: A comprehensive review. Neurocomputing 2022, 489,
271–308. [CrossRef]
44. Kalam, S.; Yousuf, U.; Abu-Khamsin, S.A.; Bin Waheed, U.; Khan, R.A. An ANN model to predict oil recovery from a 5-spot
waterflood of a heterogeneous reservoir. J. Pet. Sci. Eng. 2022, 210, 110012. [CrossRef]
45. Eckert, E.; Bělohlav, Z.; Vaněk, T.; Zámostný, P.; Herink, T. ANN modelling of pyrolysis utilising the characterisation of
atmospheric gas oil based on incomplete data. Chem. Eng. Sci. 2007, 62, 5021–5025. [CrossRef]
46. Qin, G.; Xia, A.; Lu, H.; Wang, Y.; Li, R.; Wang, C. A hybrid machine learning model for predicting crater width formed by
explosions of natural gas pipelines. J. Loss Prev. Process. Ind. 2023, 82, 104994. [CrossRef]
Sensors 2024, 24, 4013 54 of 57
47. Wang, Q.; Song, Y.; Zhang, X.; Dong, L.; Xi, Y.; Zeng, D.; Liu, Q.; Zhang, H.; Zhang, Z.; Yan, R.; et al. Evolution of corrosion
prediction models for oil and gas pipelines: From empirical-driven to data-driven. Eng. Fail. Anal. 2023, 146, 107097. [CrossRef]
48. Sami, N.A.; Ibrahim, D.S. Forecasting multiphase flowing bottom-hole pressure of vertical oil wells using three machine learning
techniques. Pet. Res. 2021, 6, 417–422. [CrossRef]
49. Chohan, H.Q.; Ahmad, I.; Mohammad, N.; Manca, D.; Caliskan, H. An integrated approach of artificial neural networks and
polynomial chaos expansion for prediction and analysis of yield and environmental impact of oil shale retorting process under
uncertainty. Fuel 2022, 329, 125351. [CrossRef]
50. Carvalho, G.d.A.; Minnett, P.J.; Ebecken, N.F.F.; Landau, L. Machine-Learning Classification of SAR Remotely-Sensed Sea-Surface
Petroleum Signatures—Part 1: Training and Testing Cross Validation. Remote Sens. 2022, 14, 3027. [CrossRef]
51. Li, X.; Han, W.; Shao, W.; Chen, L.; Zhao, D. Data-Driven Predictive Model for Mixed Oil Length Prediction in Long-Distance
Transportation Pipeline. In Proceedings of the 2021 IEEE 10th Data Driven Control and Learning Systems Conference (DDCLS),
Suzhou, China, 14–16 May 2021; pp. 1486–1491. [CrossRef]
52. Mendoza, J.H.; Tariq, R.; Espinosa, L.F.S.; Anguebes, F.; Bassam, A. Soft Computing Tools for Multiobjective Optimization of
Offshore Crude Oil and Gas Separation Plant for the Best Operational Condition. In Proceedings of the 2021 18th International
Conference on Electrical Engineering, Computing Science and Automatic Control (CCE), Mexico City, Mexico, 10–12 November
2021; pp. 1–6. [CrossRef]
53. Sakhaei, A.; Zamir, S.M.; Rene, E.R.; Veiga, M.C.; Kennes, C. Neural network-based performance assessment of one- and
two-liquid phase biotrickling filters for the removal of a waste-gas mixture containing methanol, α-pinene, and hydrogen sulfide.
Environ. Res. 2023, 237, 116978. [CrossRef] [PubMed]
54. Hasanzadeh, M.; Madani, M. Deterministic tools to predict gas assisted gravity drainage recovery factor. Energy Geosci. 2023, 5,
100267. [CrossRef]
55. Zhang, X.-Q.; Cheng, Q.-L.; Sun, W.; Zhao, Y.; Li, Z.-M. Research on a TOPSIS energy efficiency evaluation system for crude oil
gathering and transportation systems based on a GA-BP neural network. Pet. Sci. 2023, 21, 621–640. [CrossRef]
56. Ismail, A.; Ewida, H.F.; Nazeri, S.; Al-Ibiary, M.G.; Zollo, A. Gas channels and chimneys prediction using artificial neural networks
and multi-seismic attributes, offshore West Nile Delta, Egypt. J. Pet. Sci. Eng. 2022, 208, 109349. [CrossRef]
57. Goliatt, L.; Saporetti, C.; Oliveira, L.; Pereira, E. Performance of evolutionary optimized machine learning for modeling total
organic carbon in core samples of shale gas fields. Petroleum 2023, 10, 150–164. [CrossRef]
58. Amar, M.N.; Ghahfarokhi, A.J.; Ng, C.S.W.; Zeraibi, N. Optimization of WAG in real geological field using rigorous soft computing
techniques and nature-inspired algorithms. J. Pet. Sci. Eng. 2021, 206, 109038. [CrossRef]
59. Mao, W.; Wei, B.; Xu, X.; Chen, L.; Wu, T.; Peng, Z.; Ren, C. Power transformers fault diagnosis using graph neural networks
based on dissolved gas data. J. Phys. Conf. Ser. 2022, 2387, 012029. [CrossRef]
60. Ghosh, I.; Chaudhuri, T.D.; Alfaro-Cortés, E.; Gámez, M.; García, N. A hybrid approach to forecasting futures prices with
simultaneous consideration of optimality in ensemble feature selection and advanced artificial intelligence. Technol. Forecast. Soc.
Chang. 2022, 181, 121757. [CrossRef]
61. Wang, B.; Guo, Y.; Wang, D.; Zhang, Y.; He, R.; Chen, J. Prediction model of natural gas pipeline crack evolution based on
optimized DCNN-LSTM. Mech. Syst. Signal Process. 2022, 181, 109557. [CrossRef]
62. Yang, R.; Liu, X.; Yu, R.; Hu, Z.; Duan, X. Long short-term memory suggests a model for predicting shale gas production. Appl.
Energy 2022, 322, 119415. [CrossRef]
63. Werneck, R.d.O.; Prates, R.; Moura, R.; Gonçalves, M.M.; Castro, M.; Soriano-Vargas, A.; Júnior, P.R.M.; Hossain, M.M.; Zampieri,
M.F.; Ferreira, A.; et al. Data-driven deep-learning forecasting for oil production and pressure. J. Pet. Sci. Eng. 2022, 210, 109937.
[CrossRef]
64. Antariksa, G.; Muammar, R.; Nugraha, A.; Lee, J. Deep sequence model-based approach to well log data imputation and
petrophysical analysis: A case study on the West Natuna Basin, Indonesia. J. Appl. Geophys. 2023, 218, 105213. [CrossRef]
65. Das, S.; Paramane, A.; Chatterjee, S.; Rao, U.M. Accurate Identification of Transformer Faults from Dissolved Gas Data Using
Recursive Feature Elimination Method. IEEE Trans. Dielectr. Electr. Insul. 2023, 30, 466–473. [CrossRef]
66. Barjouei, H.S.; Ghorbani, H.; Mohamadian, N.; Wood, D.A.; Davoodi, S.; Moghadasi, J.; Saberi, H. Prediction performance
advantages of deep machine learning algorithms for two-phase flow rates through wellhead chokes. J. Pet. Explor. Prod. Technol.
2021, 11, 1233–1261. [CrossRef]
67. Martínez, V.; Rocha, A. The Golem: A General Data-Driven Model for Oil & Gas Forecasting Based on Recurrent Neural Networks.
IEEE Access 2023, 11, 41105–41132. [CrossRef]
68. Wang, Z.; Bai, L.; Song, G.; Zhang, Y.; Zhu, M.; Zhao, M.; Chen, L.; Wang, M. Optimized faster R-CNN for oil wells detection from
high-resolution remote sensing images. Int. J. Remote Sens. 2023, 44, 6897–6928. [CrossRef]
69. Hiassat, A.; Diabat, A.; Rahwan, I. A genetic algorithm approach for location-inventory-routing problem with perishable products.
J. Manuf. Syst. 2017, 42, 93–103. [CrossRef]
70. Sharma, V.; Cali, Ü.; Sardana, B.; Kuzlu, M.; Banga, D.; Pipattanasomporn, M. Data-driven short-term natural gas demand
forecasting with machine learning techniques. J. Pet. Sci. Eng. 2021, 206, 108979. [CrossRef]
71. Phan, H.C.; Duong, H.T. Predicting burst pressure of defected pipeline with Principal Component Analysis and adaptive Neuro
Fuzzy Inference System. Int. J. Press. Vessel. Pip. 2021, 189, 104274. [CrossRef]
Sensors 2024, 24, 4013 55 of 57
72. Hamedi, H.; Zendehboudi, S.; Rezaei, N.; Saady, N.M.C.; Zhang, B. Modeling and optimization of oil adsorption capacity on
functionalized magnetic nanoparticles using machine learning approach. J. Mol. Liq. 2023, 392, 123378. [CrossRef]
73. Castro, A.O.D.S.; Santos, M.D.J.R.; Leta, F.R.; Lima, C.B.C.; Lima, G.B.A. Unsupervised Methods to Classify Real Data from
Offshore Wells. Am. J. Oper. Res. 2021, 11, 227–241. [CrossRef]
74. Ma, B.; Shuai, J.; Liu, D.; Xu, K. Assessment on failure pressure of high strength pipeline with corrosion defects. Eng. Fail. Anal.
2013, 32, 209–219. [CrossRef]
75. Shuai, Y.; Shuai, J.; Xu, K. Probabilistic analysis of corroded pipelines based on a new failure pressure model. Eng. Fail. Anal.
2017, 81, 216–233. [CrossRef]
76. Phan, H.C.; Dhar, A.S.; Mondal, B.C. Revisiting burst pressure models for corroded pipelines. Can. J. Civ. Eng. 2017, 44, 485–494.
[CrossRef]
77. Freire, J.; Vieira, R.; Castro, J.; Benjamin, A. Part 3: Burst tests of pipeline with extensive longitudinal metal loss. Exp. Tech. 2006,
30, 60–65. [CrossRef]
78. Cronin, D.S. Assessment of Corrosion Defects in Pipelines. Ph.D. Thesis, University of Waterloo, Waterloo, ON, Canada, 2000.
79. Ghasemieh, A.; Lloyed, A.; Bahrami, P.; Vajar, P.; Kashef, R. A novel machine learning model with Stacking Ensemble Learner for
predicting emergency readmission of heart-disease patients. Decis. Anal. J. 2023, 7, 100242. [CrossRef]
80. Jeny, J.R.V.; Reddy, N.S.; Aishwarya, P.; Samreen. A Classification Approach for Heart Disease Diagnosis using Machine Learning.
In Proceedings of the 2021 6th International Conference on Signal Processing, Computing and Control (ISPCC), Solan, India, 7–9
October 2021; pp. 456–459. [CrossRef]
81. Mazumder, R.K.; Salman, A.M.; Li, Y. Failure risk analysis of pipelines using data-driven machine learning algorithms. Struct. Saf.
2021, 89, 102047. [CrossRef]
82. Liu, S.; Zhao, Y.; Wang, Z. Artificial Intelligence Method for Shear Wave Travel Time Prediction considering Reservoir Geological
Continuity. Math. Probl. Eng. 2021, 2021, 5520428. [CrossRef]
83. Saroja, S.; Haseena, S.; Madavan, R. Dissolved Gas Analysis of Transformer: An Approach Based on ML and MCDM. IEEE Trans.
Dielectr. Electr. Insul. 2023, 30, 2429–2438. [CrossRef]
84. Raj, R.A.; Sarathkumar, D.; Venkatachary, S.K.; Andrews, L.J.B. Classification and Prediction of Incipient Faults in Transformer
Oil by Supervised Machine Learning using Decision Tree. In Proceedings of the 2023 3rd International conference on Artificial
Intelligence and Signal Processing (AISP), Vijayawada, India, 18–20 March 2023; pp. 1–6. [CrossRef]
85. Aslam, N.; Khan, I.U.; Alansari, A.; Alrammah, M.; Alghwairy, A.; Alqahtani, R.; Alqahtani, R.; Almushikes, M.; AL Hashim, M.
Anomaly Detection Using Explainable Random Forest for the Prediction of Undesirable Events in Oil Wells. Appl. Comput. Intell.
Soft Comput. 2022, 2022, 1558381. [CrossRef]
86. Turan, E.M.; Jaschke, J. Classification of undesirable events in oil well operation. In Proceedings of the 2021 23rd International
Conference on Process Control (PC), Strbske Pleso, Slovakia, 1–4 June 2021; pp. 157–162. [CrossRef]
87. Gatta, F.; Giampaolo, F.; Chiaro, D.; Piccialli, F. Predictive maintenance for offshore oil wells by means of deep learning features
extraction. Expert Syst. 2022, 41, e13128. [CrossRef]
88. Brønstad, C.; Netto, S.L.; Ramos, A.L.L. Data-driven Detection and Identification of Undesirable Events in Subsea Oil Wells. In
Proceedings of the SENSORDEVICES 2021 Twelfth International Conference on Sensor Device Technologies and Applications,
Athens, Greece, 14–18 November 2021; pp. 1–6.
89. Ben Jabeur, S.; Khalfaoui, R.; Ben Arfi, W. The effect of green energy, global environmental indexes, and stock markets in
predicting oil price crashes: Evidence from explainable machine learning. J. Environ. Manag. 2021, 298, 113511. [CrossRef]
[PubMed]
90. Baabbad, H.K.H.; Artun, E.; Kulga, B. Understanding the Controlling Factors for CO2 Sequestration in Depleted Shale Reservoirs
Using Data Analytics and Machine Learning. In Proceedings of the SPE EuropEC—Europe Energy Conference featured at the
83rd EAGE Annual Conference & Exhibition, Madrid, Spain, 6–9 June 2022. [CrossRef]
91. Alsaihati, A.; Elkatatny, S.; Mahmoud, A.A.; Abdulraheem, A. Use of Machine Learning and Data Analytics to Detect Downhole
Abnormalities While Drilling Horizontal Wells, with Real Case Study. J. Energy Resour. Technol. Trans. ASME 2021, 143, 043201.
[CrossRef]
92. Kumar, A.; Hassanzadeh, H. A qualitative study of the impact of random shale barriers on SAGD performance using data
analytics and machine learning. J. Pet. Sci. Eng. 2021, 205, 108950. [CrossRef]
93. Ma, H.; Wang, H.; Geng, M.; Ai, Y.; Zhang, W.; Zheng, W. A new hybrid approach model for predicting burst pressure of corroded
pipelines of gas and oil. Eng. Fail. Anal. 2023, 149, 107248. [CrossRef]
94. Canonaco, G.; Roveri, M.; Alippi, C.; Podenzani, F.; Bennardo, A.; Conti, M.; Mancini, N. A Machine-Learning Approach for the
Prediction of Internal Corrosion in Pipeline Infrastructures. In Proceedings of the 2021 IEEE International Instrumentation and
Measurement Technology Conference (I2MTC), Glasgow, UK, 17–20 May 2021; pp. 1–6. [CrossRef]
95. Fang, J.; Cheng, X.; Gai, H.; Lin, S.; Lou, H. Development of machine learning algorithms for predicting internal corrosion of
crude oil and natural gas pipelines. Comput. Chem. Eng. 2023, 177, 108358. [CrossRef]
96. Lv, Q.; Zheng, R.; Guo, X.; Larestani, A.; Hadavimoghaddam, F.; Riazi, M.; Hemmati-Sarapardeh, A.; Wang, K.; Li, J. Modelling
minimum miscibility pressure of CO2 -crude oil systems using deep learning, tree-based, and thermodynamic models: Application
to CO2 sequestration and enhanced oil recovery. Sep. Purif. Technol. 2023, 310, 123086. [CrossRef]
Sensors 2024, 24, 4013 56 of 57
97. Zhu, X.; Zhang, H.; Ren, Q.; Zhang, D.; Zeng, F.; Zhu, X.; Zhang, L. An automatic identification method of imbalanced lithology
based on Deep Forest and K-means SMOTE. Geoenergy Sci. Eng. 2023, 224, 211595. [CrossRef]
98. Chanchotisatien, P.; Vong, C. Feature engineering and feature selection for fault type classification from dissolved gas values in
transformer oil. In Proceedings of the ICSEC 2021—25th International Computer Science and Engineering Conference, Chiang
Rai, Thailand, 18–20 November 2021; pp. 75–80. [CrossRef]
99. de Jesus Rocha Santos, M.; de Salvo Castro, A.O.; Leta, F.R.; De Araujo, J.F.M.; de Souza Ferreira, G.; de Araújo Santos, R.; de
Campos Lima, C.B.; Lima, G.B.A. Statistical analysis of offshore production sensors for failure detection applications / Análise
estatística dos sensores de produção offshore para aplicações de detecção de falhas. Braz. J. Dev. 2021, 7, 85880–85898. [CrossRef]
100. Ali, M.; Zhu, P.; Jiang, R.; Huolin, M.; Ehsan, M.; Hussain, W.; Zhang, H.; Ashraf, U.; Ullaah, J.; Ullah, J. Reservoir characterization
through comprehensive modeling of elastic logs prediction in heterogeneous rocks using unsupervised clustering and class-based
ensemble machine learning. Appl. Soft Comput. 2023, 148, 110843. [CrossRef]
101. Salamai, A.A. Deep learning framework for predictive modeling of crude oil price for sustainable management in oil markets.
Expert Syst. Appl. 2023, 211, 118658. [CrossRef]
102. Ashayeri, C.; Jha, B. Evaluation of transfer learning in data-driven methods in the assessment of unconventional resources. J. Pet.
Sci. Eng. 2021, 207, 109178. [CrossRef]
103. Vuttipittayamongkol, P.; Tung, A.; Elyan, E. A Data-Driven Decision Support Tool for Offshore Oil and Gas Decommissioning.
IEEE Access 2021, 9, 137063–137082. [CrossRef]
104. Song, T.; Zhu, W.; Chen, Z.; Jin, W.; Song, H.; Fan, L.; Yue, M. A novel well-logging data generation model integrated with
random forests and adaptive domain clustering algorithms. Geoenergy Sci. Eng. 2023, 231, 212381. [CrossRef]
105. Awuku, B.; Huang, Y.; Yodo, N. Predicting Natural Gas Pipeline Failures Caused by Natural Forces: An Artificial Intelligence
Classification Approach. Appl. Sci. 2023, 13, 4322. [CrossRef]
106. Al-Mudhafar, W.J.; Abbas, M.A.; Wood, D.A. Performance evaluation of boosting machine learning algorithms for lithofacies
classification in heterogeneous carbonate reservoirs. Mar. Pet. Geol. 2022, 145, 105886. [CrossRef]
107. Wen, H.; Liu, L.; Zhang, J.; Hu, J.; Huang, X. A hybrid machine learning model for landslide-oriented risk assessment of
long-distance pipelines. J. Environ. Manag. 2023, 342, 118177. [CrossRef] [PubMed]
108. Otchere, D.A.; Ganat, T.O.A.; Nta, V.; Brantson, E.T.; Sharma, T. Data analytics and Bayesian Optimised Extreme Gradient
Boosting approach to estimate cut-offs from wireline logs for net reservoir and pay classification. Appl. Soft Comput. 2022, 120,
108680. [CrossRef]
109. Gamal, H.; Elkatatny, S.; Alsaihati, A.; Abdulraheem, A. Intelligent Prediction for Rock Porosity While Drilling Complex Lithology
in Real Time. Comput. Intell. Neurosci. 2021, 2021, 9960478. [CrossRef]
110. Ismail, M.F.H.; May, Z.; Asirvadam, V.S.; Nayan, N.A. Machine-Learning-Based Classification for Pipeline Corrosion with Monte
Carlo Probabilistic Analysis. Energies 2023, 16, 3589. [CrossRef]
111. Prasojo, R.A.; Putra, M.A.A.; Ekojono; Apriyani, M.E.; Rahmanto, A.N.; Ghoneim, S.S.; Mahmoud, K.; Lehtonen, M.; Darwish,
M.M. Precise transformer fault diagnosis via random forest model enhanced by synthetic minority over-sampling technique.
Electr. Power Syst. Res. 2023, 220, 109361. [CrossRef]
112. Ma, Z.; Chang, H.; Sun, Z.; Liu, F.; Li, W.; Zhao, D.; Chen, C. Very Short-Term Renewable Energy Power Prediction Using XGBoost
Optimized by TPE Algorithm. In Proceedings of the 2020 4th International Conference on HVDC (HVDC), Xi’an, China, 6–9
November 2020; pp. 1236–1241. [CrossRef]
113. Ma, S.; Jiang, Z.; Liu, W. Modeling Drying-Energy Consumption in Automotive Painting Line Based on ANN and MLR for
Real-Time Prediction. Int. J. Precis. Eng. Manuf. Technol. 2019, 6, 241–254. [CrossRef]
114. Guo, Z.; Wang, H.; Kong, X.; Shen, L.; Jia, Y. Machine Learning-Based Production Prediction Model and Its Application in
Duvernay Formation. Energies 2021, 14, 5509. [CrossRef]
115. Ibrahim, N.M.; Alharbi, A.A.; Alzahrani, T.A.; Abdulkarim, A.M.; Alessa, I.A.; Hameed, A.M.; Albabtain, A.S.; Alqahtani, D.A.;
Alsawwaf, M.K.; Almuqhim, A.A. Well Performance Classification and Prediction: Deep Learning and Machine Learning Long
Term Regression Experiments on Oil, Gas, and Water Production. Sensors 2022, 22, 5326. [CrossRef] [PubMed]
116. Yin, H.; Liu, C.; Wu, W.; Song, K.; Dan, Y.; Cheng, G. An integrated framework for criticality evaluation of oil & gas pipelines
based on fuzzy logic inference and machine learning. J. Nat. Gas Sci. Eng. 2021, 96, 104264. [CrossRef]
117. Chen, H.; Zhang, C.; Jia, N.; Duncan, I.; Yang, S.; Yang, Y. A machine learning model for predicting the minimum miscibility
pressure of CO2 and crude oil system based on a support vector machine algorithm approach. Fuel 2021, 290, 120048. [CrossRef]
118. Naserzadeh, Z.; Nohegar, A. Development of HGAPSO-SVR corrosion prediction approach for offshore oil and gas pipelines. J.
Loss Prev. Process. Ind. 2023, 84, 105092. [CrossRef]
119. Yuan, Z.; Chen, L.; Liu, G.; Shao, W.; Zhang, Y.; Yang, W. Physics-based Bayesian linear regression model for predicting length of
mixed oil. Geoenergy Sci. Eng. 2023, 223, 211466. [CrossRef]
120. Box, G.E.P.; Jenkins, G.M.; Reinsel, G.C.; Ljung, G.M. Time Series Analysis: Forecasting and Control; John Wiley & Sons: Hoboken,
NJ, USA, 2015.
121. McCuen, R.H. Modeling Hydrologic Change: Statistical Methods; CRC Press: Boca Raton, FL, USA, 2016.
122. Liu, J.; Zhao, Z.; Zhong, Y.; Zhao, C.; Zhang, G. Prediction of the dissolved gas concentration in power transformer oil based on
SARIMA model. Energy Rep. 2022, 8, 1360–1367. [CrossRef]
Sensors 2024, 24, 4013 57 of 57
123. Li, X.; Guo, X.; Liu, L.; Cao, Y.; Yang, B. A novel seasonal grey model for forecasting the quarterly natural gas production in China.
Energy Rep. 2022, 8, 9142–9157. [CrossRef]
124. Rashidi, S.; Mehrad, M.; Ghorbani, H.; Wood, D.A.; Mohamadian, N.; Moghadasi, J.; Davoodi, S. Determination of bubble point
pressure & oil formation volume factor of crude oils applying multiple hidden layers extreme learning machine algorithms. J. Pet.
Sci. Eng. 2021, 202, 108425. [CrossRef]
125. Gong, X.; Liu, L.; Ma, L.; Dai, J.; Zhang, H.; Liang, J.; Liang, S. A Leak Sample Dataset Construction Method for Gas Pipeline
Leakage Estimation Using Pipeline Studio. In Proceedings of the International Conference on Advanced Mechatronic Systems
(ICAMechS), Tokyo, Japan, 9–12 December 2021; pp. 28–32. [CrossRef]
126. Chung, S.; Loh, A.; Jennings, C.M.; Sosnowski, K.; Ha, S.Y.; Yim, U.H.; Yoon, J.-Y. Capillary flow velocity profile analysis on
paper-based microfluidic chips for screening oil types using machine learning. J. Hazard. Mater. 2023, 447, 130806. [CrossRef]
[PubMed]
127. Mohamadian, N.; Ghorbani, H.; Wood, D.A.; Mehrad, M.; Davoodi, S.; Rashidi, S.; Soleimanian, A.; Shahvand, A.K. A
geomechanical approach to casing collapse prediction in oil and gas wells aided by machine learning. J. Pet. Sci. Eng. 2021, 196,
107811. [CrossRef]
128. Sabah, M.; Mehrad, M.; Ashrafi, S.B.; Wood, D.A.; Fathi, S. Hybrid machine learning algorithms to enhance lost-circulation
prediction and management in the Marun oil field. J. Pet. Sci. Eng. 2021, 198, 108125. [CrossRef]
129. Shi, J.; Xie, W.; Huang, X.; Xiao, F.; Usmani, A.S.; Khan, F.; Yin, X.; Chen, G. Real-time natural gas release forecasting by using
physics-guided deep learning probability model. J. Clean. Prod. 2022, 368, 133201. [CrossRef]
130. Machado, A.P.F.; Vargas, R.E.V.; Ciarelli, P.M.; Munaro, C.J. Improving performance of one-class classifiers applied to anomaly
detection in oil wells. J. Pet. Sci. Eng. 2022, 218, 110983. [CrossRef]
131. Zhou, J.; Liu, B.; Shao, M.; Yin, C.; Jiang, Y.; Song, Y. Lithologic classification of pyroclastic rocks: A case study for the third
member of the Huoshiling Formation, Dehui fault depression, Songliao Basin, NE China. J. Pet. Sci. Eng. 2022, 214, 110456.
[CrossRef]
132. Zhang, G.; Wang, Z.; Mohaghegh, S.; Lin, C.; Sun, Y.; Pei, S. Pattern visualization and understanding of machine learning models
for permeability prediction in tight sandstone reservoirs. J. Pet. Sci. Eng. 2021, 200, 108142. [CrossRef]
133. Zuo, Z.; Ma, L.; Liang, S.; Liang, J.; Zhang, H.; Liu, T. A semi-supervised leakage detection method driven by multivariate time
series for natural gas gathering pipeline. Process. Saf. Environ. Prot. 2022, 164, 468–478. [CrossRef]
134. Chen, Z.; Yu, W.; Liang, J.-T.; Wang, S.; Liang, H.-C. Application of statistical machine learning clustering algorithms to improve
EUR predictions using decline curve analysis in shale-gas reservoirs. J. Pet. Sci. Eng. 2022, 208, 109216. [CrossRef]
135. Fernandes, W.; Komati, K.S.; Gazolli, K.A.d.S. Anomaly detection in oil-producing wells: A comparative study of one-class
classifiers in a multivariate time series dataset. J. Pet. Explor. Prod. Technol. 2023, 14, 343–363. [CrossRef]
136. Gao, G.; Hazbeh, O.; Rajabi, M.; Tabasi, S.; Ghorbani, H.; Seyedkamali, R.; Shayanmanesh, M.; Radwan, A.E.; Mosavi, A.H.
Application of GMDH model to predict pore pressure. Front. Earth Sci. 2023, 10, 1043719. [CrossRef]
137. Cirac, G.; Farfan, J.; Avansi, G.D.; Schiozer, D.J.; Rocha, A. Deep hierarchical distillation proxy-oil modeling for heterogeneous
carbonate reservoirs. Eng. Appl. Artif. Intell. 2023, 126, 107076. [CrossRef]
138. Dayev, Z.; Shopanova, G.; Toksanbaeva, B.; Yetilmezsoy, K.; Sultanov, N.; Sihag, P.; Bahramian, M.; Kıyan, E. Modeling the flow
rate of dry part in the wet gas mixture using decision tree/kernel/non-parametric regression-based soft-computing techniques.
Flow Meas. Instrum. 2022, 86, 102195. [CrossRef]
139. Das, S.; Paramane, A.; Chatterjee, S.; Rao, U.M. Sensing Incipient Faults in Power Transformers Using Bi-Directional Long
Short-Term Memory Network. IEEE Sens. Lett. 2023, 7, 7000304. [CrossRef]
140. Gao, J.; Li, Z.; Zhang, M.; Gao, Y.; Gao, W. Unsupervised Seismic Random Noise Suppression Based on Local Similarity and
Replacement Strategy. IEEE Access 2023, 11, 48924–48934. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.