Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
6 views28 pages

Algorithms 17 00364 v2

The document presents HRIDM, a hybrid residual/inception-based deep learning model designed for detecting arrhythmias from large sets of 12-lead ECG recordings. Trained on over 10,000 patients, the model achieved a test accuracy of 50.87% across 27 categories of heart abnormalities, outperforming several state-of-the-art models. The study highlights the potential of HRIDM to improve the efficiency and accuracy of heart disease detection using ECG data.

Uploaded by

sharmfec
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views28 pages

Algorithms 17 00364 v2

The document presents HRIDM, a hybrid residual/inception-based deep learning model designed for detecting arrhythmias from large sets of 12-lead ECG recordings. Trained on over 10,000 patients, the model achieved a test accuracy of 50.87% across 27 categories of heart abnormalities, outperforming several state-of-the-art models. The study highlights the potential of HRIDM to improve the efficiency and accuracy of heart disease detection using ECG data.

Uploaded by

sharmfec
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

algorithms

Article
HRIDM: Hybrid Residual/Inception-Based Deeper Model for
Arrhythmia Detection from Large Sets of 12-Lead ECG Recordings
Syed Atif Moqurrab , Hari Mohan Rai * and Joon Yoo *

School of Computing, Gachon University, 1342 Seongnam-daero, Sujeong-gu,


Seongnam-si 13120, Republic of Korea; [email protected]
* Correspondence: [email protected] (H.M.R.); [email protected] (J.Y.)

Abstract: Heart diseases such as cardiovascular and myocardial infarction are the foremost reasons of
death in the world. The timely, accurate, and effective prediction of heart diseases is crucial for saving
lives. Electrocardiography (ECG) is a primary non-invasive method to identify cardiac abnormalities.
However, manual interpretation of ECG recordings for heart disease diagnosis is a time-consuming
and inaccurate process. For the accurate and efficient detection of heart diseases from the 12-lead
ECG dataset, we have proposed a hybrid residual/inception-based deeper model (HRIDM). In
this study, we have utilized ECG datasets from various sources, which are multi-institutional large
ECG datasets. The proposed model is trained on 12-lead ECG data from over 10,000 patients. We
have compared the proposed model with several state-of-the-art (SOTA) models, such as LeNet-5,
AlexNet, VGG-16, ResNet-50, Inception, and LSTM, on the same training and test datasets. To show
the effectiveness of the computational efficiency of the proposed model, we have only trained over
20 epochs without GPU support and we achieved an accuracy of 50.87% on the test dataset for
27 categories of heart abnormalities. We found that our proposed model outperformed the previous
studies which participated in the official PhysioNet/CinC Challenge 2020 and achieved fourth place
as compared with the 41 official ranking teams. The result of this study indicates that the proposed
model is an implying new method for predicting heart diseases using 12-lead ECGs.

Citation: Moqurrab, S.A.; Rai, H.M.; Keywords: cardiac abnormality detection; 12-lead ECG; deep learning; large dataset
Yoo, J. HRIDM: Hybrid
Residual/Inception-Based Deeper
Model for Arrhythmia Detection from
Large Sets of 12-Lead ECG
1. Introduction
Recordings. Algorithms 2024, 17, 364.
https://doi.org/10.3390/a17080364 Heart diseases are a leading cause of death worldwide. Coronary heart diseases
include arrhythmias, prolapsed mitral valves, coronary artery disease, congenital heart
Academic Editors: Ines Prata
disease, congestive heart failure, and many others [1,2]. Various traditional approaches
Machado, Francesco Prinzi and
such as blood tests, chest X-rays, and ECGs are used to detect such diseases [3]. ECG is
Carmelo Militello
a widely used approach and is recognized as the most effective means of detecting heart
Received: 5 July 2024 problems in the current era [4]. It is a painless procedure to monitor heart health and is used
Revised: 8 August 2024 to detect various heart conditions, including arrhythmias and blockages in arteries that
Accepted: 12 August 2024 cause chest pain or even lead to a heart attack [1,5]. Precise diagnosis of irregular heartbeats
Published: 19 August 2024 through ECG analysis can significantly contribute to the early detection of cardiac illness.
Extracting the relevant and significant information from the ECG signals using computer
systems poses a considerable challenge. The automatic identification and categorization of
cardiac abnormalities has the potential to assist clinicians in diagnosing a growing number
Copyright: © 2024 by the authors.
of ECGs [6,7]. However, accomplishing this task is highly challenging.
Licensee MDPI, Basel, Switzerland.
Although various machine learning algorithms have been developed over the past
This article is an open access article
decades for classifying cardiac abnormalities, their testing has often been limited to small or
distributed under the terms and
homogeneous datasets [8–11]. To address this issue, the PhysioNet/CinC Challenge 2020
conditions of the Creative Commons
Attribution (CC BY) license (https://
has provided a substantial number of datasets from diverse sources worldwide. The exten-
creativecommons.org/licenses/by/
sive dataset offers a valuable opportunity to develop automatic systems that are capable of
4.0/).
effectively classifying cardiac abnormalities. Previously, several ML techniques have been

Algorithms 2024, 17, 364. https://doi.org/10.3390/a17080364 https://www.mdpi.com/journal/algorithms


Algorithms 2024, 17, 364 2 of 28

employed to classify heart diseases based on raw ECG data, reflecting the growing interest
in the automated detection of abnormal behavior in ECG signals. Recently, DL has emerged
as a valuable tool in biomedical and healthcare systems by learning intricate patterns and
features from signal data [11–13]. In DL, various deep neural networks such as recurrent
neural networks (RNNs) [14], deep residual networks (ResNets) [15], transformers, and
attention networks have been developed to classify the diseases. ResNet is one of the
popular networks and it handles the complexity of large datasets efficiently with its unique
deep layer-based architecture.
Despite of the many advancements in the field of machine learning (ML) and DL
for ECG signal analysis, the existing solutions are limited in their ability to accurately
detect cardiac abnormalities, especially in many categories. These limitations highlight
the need for more robust and generalized DL models that can effectively handle large
and diverse datasets. The challenge lies in developing models that not only improve
detection accuracy but also enhance computational efficiency and generalizability. In order
to address the shortcomings of other existing methods, this work presents the HRIDM
(Hybrid Residual/Inception-based Deeper Model), a unique deep learning model. This
work aims to enhance the detection efficiency of heart disease using a large 12-lead ECG
dataset through DL neural networks. Specifically, this study aims to:
• Develop a novel deep learning model for ECG abnormality detection, which could out-
perform the prior state-of-the-art (SOTA) models, considerably enhancing performance;
• Utilize the annotated dataset provided by the PhysioNet/CinC Challenge 2020 to
accurately classify the 27 different cardiac abnormalities, demonstrating the model’s
capability to handle complex and large-scale datasets effectively;
• Train the proposed model on a large 12-lead ECG dataset from over 10,000 patients, to
improve its generalizability and robustness;
• Illustrate the computational efficiency of the proposed model by achieving high accu-
racy with limited resources, such as minimal training epochs and no GPU support;
• Benchmark the HRIDM against several SOTA models, including Inception, LeNet-5,
AlexNet, VGG-16, ResNet-50, and LSTM, to validate its performance.

2. Literature Review
Numerous studies have been carried out on the application of ML/DL for the analysis
of ECG signals [3,11] and the detection of cardiac arrhythmias [16,17]. In [18], authors rec-
ommended a Residual CNN-GRU-based deep learning model with an attention mechanism
for classifying cardiac disorders. In this work, they utilized 24 groups for classification out
of 27. The proposed approach attained a test accuracy of 12.2% and obtained 30th position
among 41 teams in the official ranking. The authors of [19] introduced a modified ResNet
model that includes numerous basic blocks and four modified residual blocks. The updated
model is a 1D-CNN that supports attention across feature maps. The authors employed
fine-tuning on the pre-trained model and their proposed algorithm attained a test accuracy
of 20.8%. The authors of [20] presented a 1D-CNN with global skip connections to classify
ECG signals from 12-lead ECG recordings into numerous classes. The authors have also
utilized many preprocessing and learning methods such as a customized loss function,
Bayesian threshold optimization, and a dedicated classification layer. Their implemented
technique produced an accuracy of 20.2% on the test dataset. The authors of [21] introduced
an SE-ResNet model with 34 layers in the DL model to classify cardiac arrhythmias. They
assigned different weights to different classes based on their similarity and utilized these
weights in metric calculations. The utilized DL model attained a validation accuracy of
65.3% and test accuracy of 35.9%. In [22], the authors presented a hybrid Recurrent Con-
volutional Neural Network (CRNN) with 49 1D convolutional layers, along with 16 skip
connections and one Bi-LSTM layer to detect heart abnormalities using a 12-Lead ECG sig-
nal. Utilizing the proposed model with 10-fold cross validation and without preprocessing,
the authors achieved 62.3% validation accuracy, and 38.2% test accuracy. The authors of [23]
utilized an SE-ECGNet to detect arrhythmias from 12-lead ECG recordings. In their model
Algorithms 2024, 17, 364 3 of 28

design, they utilized squeeze-and-excitation networks in each model path and an attention
mechanism which learns feature weights corresponding to loss. In this study, the authors
achieved 64.0% validation accuracy and 41.1% test accuracy using the proposed model.
In [24], the authors presented a deep 1D-CNN model consisting of exponentially dilated
causal convolutions in its structure. Their proposed model achieved a challenge score of
0.565 ± 0.005 and an AU-ROC of 0.939 ± 0.004 using 10-fold cross validation techniques.
In this work, they achieved 41.7% accuracy on the test dataset utilizing the proposed model.
In [25], the authors introduced an 18-layer residual CNN for arrhythmia classification
which has four stages for abnormality classification. The authors have also utilized prepro-
cessing techniques, 10-fold cross validation, and post-training procedure refinement. The
proposed approach achieved 69.5% validation score and 42% test accuracy. The authors
of [26] utilized a hybrid DL model by integrating a CNN with LSTM along with adversarial
domain generalization for the detection of arrythmias from 12-lead ECG signals. In this
study, the proposed model obtained 43.7% accuracy on the test dataset. In [27], the authors
presented a method for ECG signal classification in which they utilized scatter transform in
combination with deep residual networks (ResNets). In their study, the authors obtained
48% test accuracy utilizing their proposed methodology. In [28], the authors designed an
SE-ResNet-based DL model which is a variant of the ResNet architecture. In their model
design, SE blocks are utilized to learn from the first 10 and 30 s segments of ECG signals.
The authors also utilized an external open-source dataset for model validation. To correct
and verify the output, they developed a rule-based bradycardia model based on clinical
knowledge. Utilizing the proposed approach, the authors detected heart arrhythmias from
12-lead ECG recordings and obtained a validation accuracy of 68.2% and a testing accuracy
of 51.4%. The authors of [29] introduced a DL method for the classification of arrhythmias
utilizing 12-lead ECG signals. In their proposed approach, the authors presented a modified
ResNet with SE blocks. Additionally, they applied zero padding to extend the signal to
4096 samples and downsampled them to 257 Hz. Utilizing custom weighted accuracy
measure and 5-fold cross validation, they obtained a validation accuracy of 68.4%, and a
test accuracy of 52%.
In [30], the authors proposed a novel approach using a Wide and Deep Transformer
Neural Network for the detection of cardiac abnormalities utilizing 12-lead ECG recordings.
In their methodology, they combine two features: transformer neural network features and
random forest handcrafted ECG features. The utilized approach achieved an impressive
accuracy of 58.7% on the validation dataset and 53.3% on the test dataset.

3. Materials and Methods


In this paper, we developed a new model (HRIDM) for classifying ECG signal ab-
normalities that integrates the strength of an inception network with residual blocks. The
deep inception network can learn complex features from the dataset, whereas residual
blocks improve model accuracy by resolving the problem of vanishing gradients. We
validated the proposed model using a dataset of 12-lead ECG signals from patients with
a range of cardiac disorders. We evaluated our model’s performance against a variety of
SOTA models, including LeNet, AlexNet, VGG, LSTM, ResNet, and Inception. All the
models were trained on the PhysioNet/CinC Challenge 2020 dataset and tested using the
independent test dataset provided by the organizers. The reported accuracy was achieved
through our own testing and validated using the validation techniques provided by the
PhysioNet/CinC Challenge 2020 organizers. We observed that our model outperformed
DNNs and the models outlined in previous research. The improved outcomes indicate that
the proposed model is a novel and promising approach to classifying ECG data in order to
identify cardiac anomalies.

3.1. Datasets
The study’s dataset, which included recordings, diagnostic data, and demographic
information, is collected from several open-source databases freely available to download
Algorithms 2024, 17, 364 4 of 28

(Table 1). To generate this large dataset, five different sources were used. All the datasets
contain 12-lead ECG recordings where the sample frequencies ranges from 257 Hz to 1 KHz.
The datasets also include the demographic information such as age, sex, and types of
diagnosis. There are 27 categories of ECG classes (diagnosis) which are presented along
with SNOMED CT codes (Systematized Nomenclature of Medicine Clinical Terms). The
following subsections detail the specific sources comprising the dataset.

Table 1. Dataset description.

Recordings in Recordings in
Database Total Recordings Recordings in Test Set Total Patients
Validation Set Training Set
CPSC 13,256 1463 1463 10,330 9458
INCART 74 0 0 74 32
PTB 22,353 0 0 22,353 19,175
G12EC 20,678 5167 5167 10,344 15,742
Undisclosed 10,000 10,000 0 0 Unknown
Total 66,361 16,630 6630 43,101 Unknown

1. CPSC Database (CPSC2018): The initial source of the ECG data is the China Phys-
iological Signal Challenge 2018 [31], which contains 13,256 ECG recordings and
9458 patients;
2. INCART Database: The second source of the ECG data is the 12-lead ECG arrhythmias
dataset, which is an open-source, publicly available dataset from the St. Petersburg
Institute of Cardiological Technics (INCART), St. Petersburg, Russia [32]. This dataset
has only 74 recordings and was contributed by 32 patients;
3. PTB and PTB-XL Database: The third dataset is a combination of two databases
(PTB and PTB-XL), which contains 22,353 ECG recordings and was contributed by
19,175 patients;
4. Georgia 12-lead ECG Challenge (G12EC) Database: The fourth ECG dataset is also
a 12-lead ECG dataset which was made available by the Emory University, Atlanta,
Georgia, USA [32]. This dataset was collected from 15,742 patients and it contains
20,678 ECG recordings;
5. Undisclosed: This dataset was only used for testing the model performance in the
challenge. The source of the dataset is an undisclosed American institution, and this
dataset is completely different from the other datasets. This dataset has never been
posted or disclosed publicly and will not be disclosed in future either. It contains
10,000 recordings, the number of patients is unknown, and no training and validation
sets are used from this dataset.
Detailed information about the datasets is provided in Appendix A, specifically in
Table A2. This table includes the number of recordings, mean duration of recordings, mean
age of patients, sex of patients, and the sampling frequency for each dataset included in the
PhysioNet/CinC Challenge 2020 dataset [33].
The ECG dataset used in this study was obtained from the PhysioNet/CinC Challenge
2020 database [32,33]. The dataset consists of 66,361 ECG recordings, of which 43,101 are
for training, 6630 for validation, and 16,630 for testing. To train the model efficiently, we
utilized the 43,101 training recordings and split them into training and validation sets in
a 90:10 ratio, resulting in 38,790 training samples and 4311 validation samples. To ensure
rigorous model evaluation and prevent data leakage, we opted to partition the provided
training dataset (excluding the designated validation set of 6630 recordings) into separate
training and validation subsets. This approach allowed for robust model training and
hyperparameter tuning without compromising evaluation integrity. For testing, we used an
undisclosed hidden dataset of 10,000 recordings, which has different sampling frequencies.
Algorithms 2024, 17, 364 5 of 29

ensure rigorous model evaluation and prevent data leakage, we opted to partition the
Algorithms 2024, 17, 364 provided training dataset (excluding the designated validation set of 6630 recordings) into
5 of 28
separate training and validation subsets. This approach allowed for robust model training
and hyperparameter tuning without compromising evaluation integrity. For testing, we
used an undisclosed hidden dataset of 10,000 recordings, which has different sampling
Figure 1 shows thefrequencies.
distribution of ECG signal lengths in the dataset. The figure shows
Figure 1 shows the distribution of ECG signal lengths in the dataset. The figure shows
that 95% of the ECG signals
that 95%in of the dataset
the ECG signalshave
in the adataset
lengthhaveofa 5000
length samples. TheThe
of 5000 samples. remaining
remaining
5% of the ECG signals 5%have lengths
of the thathave
ECG signals range from
lengths that5500
range to
from115,200 samples.
5500 to 115,200 samples.

Figure 1. Data distribution based on signal length.


Figure 1. Data distribution based on signal length.
Figure 2 visualizes the distribution of cardiac abnormalities in the utilized datasets.
Figure 2 visualizes Thethe distribution
vertical of the
axis represents cardiac
numberabnormalities in the
of abnormalities, and utilizedaxis
the horizontal datasets.
lists the
The vertical axis represents names ofthe
thenumber ofabbreviated
27 classes in abnormalities,
form. Theand the horizontal
abbreviations axis liststothe
used, corresponding the
names of the 27 classesDiagnosis and SNOMED CT codes, are provided in Appendix A, Table A1. It is worth
in abbreviated form. The abbreviations used, corresponding to the
noting that ”Sinus Rhythm (SNR)“ appears as the most frequent abnormality in the graph.
Diagnosis
Algorithms and
2024, 17, 364 SNOMED CT codes, are provided in Appendix A, Table A1. It is worth 6 of 29

noting that “Sinus Rhythm (SNR)” appears as the most frequent abnormality in the graph.

Figure 2. Distribution of ECG signals across 27 diagnosis categories.


Figure 2. Distribution of ECG signals across 27 diagnosis categories.
3.2. Data Preprocessing
3.2. Data Preprocessing Our hybrid residual/inception-based deep model (HRIDM) machine learning strat-
Our hybrid residual/inception-based deep
egy for classifying ECG signals model
focuses (HRIDM)
on various machine
preprocessing learning
approaches. Ourstrat-
ECG
dataset from PhysioNet is large and diverse, with recordings of different lengths and sizes
egy for classifying ECG signals focuses on various preprocessing approaches. Our ECG
(42,720). Hence, preprocessing is essential to prepare the data for useful analysis. We use
dataset from PhysioNet is large1’sand
Algorithm diverse,
multi-step with recordings
preprocessing approach toof different
make sure thelengths and sizes
data are standardized
(42,720). Hence, preprocessing is essential to prepare the data for useful analysis. We
and appropriate for use in machine learning models. In order to provide effective use
training,
this iterative method serves as a generator function, continuously
Algorithm 1’s multi-step preprocessing approach to make sure the data are standardized producing batches of
features and labels.
and appropriate for use in machine learning models. In order to provide effective training,
this iterative method serves
Algorithm as1:aData
generator function,
preprocessing algorithmcontinuously
utilized for ECGproducing
data. batches of
features and labels. Initialization: {𝒈𝒆𝒏𝒙: Generator for features, 𝒈𝒆𝒏𝒚 : Generator for labels, 𝑿𝒕𝒓𝒂𝒊𝒏: Status
result, 𝒚𝒕𝒓𝒂𝒊𝒏 : Database
Input: {𝒈𝒆𝒏𝒙 , 𝒈𝒆𝒏𝒚 , 𝑿𝒕𝒓𝒂𝒊𝒏 , 𝒚𝒕𝒓𝒂𝒊𝒏 }
Output: {𝑿𝒃𝒂𝒕𝒄𝒉 , 𝒚𝒃𝒂𝒕𝒄𝒉 }
#Step 1: Initialize Parameters and Data Structures
Set 𝒃𝒂𝒕𝒄𝒉_𝒔𝒊𝒛𝒆
Create 𝒐𝒓𝒅𝒆𝒓_𝒂𝒓𝒓𝒂𝒚
Algorithms 2024, 17, 364 6 of 28

Algorithm 1: Data preprocessing algorithm utilized for ECG data.


Initialization: {genx : Generator for features, geny : Generator for labels, Xtrain : Status result,
ytrain : Database
 }
Input: genx , geny , Xtrain , ytrain
Output: {Xbatch , ybatch }
#Step 1: Initialize Parameters and Data Structures
Set batch_size
Create order_array
#Step 2: Generate Batches
While True :
Initialize empty arrays for batch_ f eatures, batch_labels
For each batch :
For i in range batch_size :
batch_ f eatures[i ] = next(gen
 x)
batch_labels[i ] = next geny
batch_features− X batch_features
Xnormalized = σbatch_features
Yield( Xnormalized , batch_labels)
#Step 3: Shuffle Labels
While True :
For i in order_array :) 
Yield shu f f led labels : yshu f f led = ytrain [i ]
#Step 4: Preprocess Features
While True:
For i in order_array :
Load and preprocess feature data:
data, header_data = load_data( Xtrain [i ])

’ post’ , padding =’ post’



Xtrain_padded = pad_sequences data, maxlen = 5000, truncating =
Xtrain_reshaped = Xtrain_padded .reshape(5000, 12)
Xtrain_reshaped − X train_reshaped
Xtrain_reshaped_normalized = σtrain_reshaped )
Yield( Xtrain_new_normalized )

The basic idea behind Algorithm 1 is the designation of a batch size, which establishes
how many data points are processed simultaneously during training. Additionally, in order
to prevent the model gaining biases from the original recording sequence, we shuffle the
order of the training data points. Several generator functions retrieve features and labels
for each data point inside each batch, most frequently by gaining access to external data
sources. On the obtained features, we apply normalization to compensate for possible
differences in signal strength between recordings. By scaling the features to a particular
range (often between 0 and 1) based on the mean and standard deviation of the current
batch, this normalization makes sure that each feature contributes equally during the
training process [34]. Another loop shuffles the labels in the training set while batch
generation is taking place. By preventing the model from picking up possible dependencies
based on the initial label order, this step eventually enhances the model’s capacity to
generalize to unseen inputs.
Preparing each individual feature is an additional vital component of preprocessing;
this is achieved via an additional loop that iterates over the shuffled order array. Here, we
use the current index in the order array to access a particular training data point, which
is a raw ECG signal [35,36]. Using zeros as padding ensures that all of the input data
points have the same format. This is required if the duration of the recovered ECG signal
is less than the required input size, which is usually 5000 samples. The data are then
rearranged into a two-dimensional structure with 12 columns (representing the 12 ECG
leads) and 5000 rows (representing time samples) after padding. Through this reshaping,
Algorithms 2024, 17, 364 7 of 28

the one-dimensional data are effectively transformed into a format that allows the machine
learning model to consider each ECG lead as an independent channel.
In the last stage of preprocessing, we have utilized the ECG data to normalize the whole
training set using the pre-calculated mean X and standard deviation σ. The normalization
of ECG data is given by Equation (1) [37].

batch_ f eatures − X batch_ f eatures


Xnormalized = (1)
σbatch_ f eatures
Algorithms 2024, 17, 364 8 of 29

As a result, the model learns more robustly, and all characteristics are normalized on
an identical scale across the training phase [38,39]. We produce batches of preprocessed
As a result, the model learns more robustly, and all characteristics are normalized on
features and shuffled labels continually by executing these processes recursively within the
an identical scale across the training phase [38,39]. We produce batches of preprocessed
generator function.
features The machine
and shuffled learning
labels model
continually is now able
by executing thesetoprocesses
train onrecursively
the information
within we
have providedthefor accurate
generator ECG The
function. signal classification
machine in medical
learning model applications
is now able to train on with ease due
the infor-
mation
to our carefully we have provided
developed for accurate
preprocessing ECG
steps, signal successfully
which classification intackles
medicalthe
applications
issues posed
with
by our sizable andease due to ECG
varied our carefully
dataset. developed preprocessing steps,
The preprocessing which successfully
technique utilized intackles
this work
the issues posed by our sizable and varied ECG dataset. The preprocessing technique uti-
is presented lized
in the form of a flow chart in Figure 3, and Figure 4 presents the segmented
in this work is presented in the form of a flow chart in Figure 3, and Figure 4 presents
and preprocessed 12-leadand
the segmented ECG signals. 12-lead ECG signals.
preprocessed

Figure 3. The Figure 3. The flowchart of the utilized preprocessing technique.


flowchart of the utilized preprocessing technique.
Algorithms 2024,
2024, 17, 364 9 8ofof 29
28

Figure 4. Segmented and preprocessed 12-lead ECG signals.


Figure 4. Segmented and preprocessed 12-lead ECG signals.

3.3. SOTA
3.3. SOTA Models
Models
The SOTA
The SOTA models
models utilized
utilized to to validate
validate the
the proposed
proposed methodologies
methodologies and and the
the proposed
proposed
model (HRIDM)
model (HRIDM)are areLeNet-5,
LeNet-5,AlexNet,
AlexNet, VGG16,
VGG16, ResNet50,
ResNet50, Inception,
Inception, andand LSTM.
LSTM. These These
are
are prominent
prominent and and commonly
commonly utilized
utilized modelsmodels for various
for various tasks, tasks,
especially especially signal
signal and and
image
image classification.
classification.
LeNet-5 was
LeNet-5 was the
the first
first basic
basic convolutional
convolutional neuralneural network
network (CNN)(CNN) model model introduced
introduced
in 1998 by Yann LeCun et al. [40]. It consists of seven layers: three
in 1998 by Yann LeCun et al. [40]. It consists of seven layers: three Convolution (Conv) Convolution (Conv)
layers, two pooling layers (average pooling), and two fully connected
layers, two pooling layers (average pooling), and two fully connected layers (FC) along layers (FC) along
with sigmoid
with sigmoid or or tanh
tanh activation
activation functions.
functions. This This was
was thethe first
first CNN
CNN model model successfully
successfully
trained on the MNSIT dataset for a digit recognition task.
trained on the MNSIT dataset for a digit recognition task. However, due to its However, due to its structure
structure
containing fewer layers, it is not suitable for more
containing fewer layers, it is not suitable for more complex tasks. complex tasks.
AlexNet was
AlexNet was first
first introduced
introduced in in 2012
2012 by
by Alex
Alex Krizhevsky
Krizhevsky et et al.
al. [41],
[41],which
which introduced
introduced
the ReLU activation function and used dropout layer to overcome
the ReLU activation function and used dropout layer to overcome overfitting [42]. AlexNet overfitting [42]. AlexNet
consists of eight layers: five Conv layers, among which the first, second, and fifthfifth
consists of eight layers: five Conv layers, among which the first, second, and are
are fol-
followed by max-pooling layers, and three fully connected layers.
lowed by max-pooling layers, and three fully connected layers. All layers use ReLU acti- All layers use ReLU
activations except the output layer, which included the SoftMax activation function. To
vations except the output layer, which included the SoftMax activation function. To cap-
capture hierarchies in the data, this model makes use of the filters’ depth and stride.
ture hierarchies in the data, this model makes use of the filters’ depth and stride. However,
However, because it has an extensive number of parameters, computation costs are high.
because it has an extensive number of parameters, computation costs are high.
VGG16 is a prominent deep CNN model introduced in 2014 by the Visual Geometry
VGG16 is a prominent deep CNN model introduced in 2014 by the Visual Geometry
Group [43] at Oxford University. It consists of 16 layers: 13 Convolutional layers and three
Group [43] at Oxford University. It consists of 16 layers: 13 Convolutional layers and three
fully connected layers. VGG16 utilizes small 3 × 3 filters in Conv layers throughout its
fully connected layers. VGG16 utilizes small 3 × 3 filters in Conv layers throughout its
structure, and max-pooling layers are applied after some of the Conv layers to downsample
structure, and max-pooling layers are applied after some of the Conv layers to downsam-
the feature maps. Due to its deeper architecture, it is very effective in capturing fine details
ple the feature maps. Due to its deeper architecture, it is very effective in capturing fine
of the features and capable of performing more complex tasks effectively. Its specialty
details of the features and capable of performing more complex tasks effectively. Its spe-
lies in its simple and uniform structure, which is easy to design and extend. However, its
cialty lies in its simple and uniform structure, which is easy to design and extend. How-
limitation is the slow training time due to large parameters and depth.
ever, The
its limitation
ResNet50ismodel,
the slow trainingin
developed time
2015duebyto large parameters
Kaiming He et al. [44], andwhich
depth.introduced
The ResNet50 model, developed in 2015 by Kaiming He et
the concept of the residual block, contains 50 layers and is capable of resolving the al. [44], which introduced
vanishing
the concept of the residual block, contains 50 layers and is
gradient problems in deeper networks. The residual block is mathematically definedcapable of resolving the van-as
ishing gradient problems in deeper networks. The residual block
(y = f ( x ) + x ), where f ( x ) is the CNN output within the block and x is the input. This is mathematically de-
fined (𝑦 = 𝑓(𝑥)the
modelasintroduced + 𝑥), whereof𝑓(𝑥)
concept skip is the CNN and
connections output within
allows the blocktoand
the gradient pass𝑥directly
is the
input. This model introduced the concept of skip connections and
through these connections. It also introduced bottleneck design in its structure of the allows the gradient to
Algorithms 2024, 17, 364 9 of 28

residual blocks consisting of 1 × 1, 3 × 3, and 1 × 1 Conv layers. The 50 layers in


the ResNet50 model with residual blocks enable the capture of more complex patterns.
However, its limitation is the high computational cost due to its complex and deep structure.
The Inception model is a deep CNN model that introduced the concept of inception
modules to enhance the efficiency and accuracy of DL models. The first version, Inception
v3, was introduced in 2015 by Szegedy et al. [45], and uses parallel Conv layers within the
same modules to capture very fine details at different levels. Inception v3 has 48 layers
with inception modules which consist of multiple Conv layers with different parallel filters
(1 × 1, 3 × 3, 5 × 5) along with max-pooling layers. It is capable of capturing very complex
patterns with a smaller number of patterns as compared to similar DL models.
LSTM (Long Short-Term Memory) networks, introduced in 1997 by Hochreiter and
Schmidhuber [46], are especially designed for sequential data patterns. LSTM was intro-
duced to resolve the problem of short-term memory by incorporating gates and states. An
LSTM network consists of many cells, including a cell state (ct ) and a hidden state (ht ),
as well as gates such as the input gate (it ), forget gate ( f t ), and output (ot ) gate. LSTM
has the ability to capture long-term dependencies from sequential data, making it highly
suitable for time series tasks and language modeling.

3.4. Proposed Model (HRIDM)


The aim of this research was to determine the most effective algorithm on the utilized
dataset. Figure 5 depicts the proposed model and comprehensive methodology employed.
The proposed HRIDM consists of three main sections. The first section serves as the input
layer, incorporating multiple convolutional, residual, and inception blocks to extract the
primary and fine features from the data, and is also responsible for producing output. We
integrated residual blocks with inception blocks in our proposed model because this com-
bination leverages the strength of both types of blocks, enhancing the overall performance
of the model for arrhythmia detection in 12-lead ECG recordings. The residual blocks
address the problem of vanishing gradient problem and perform deeper training through
skip connections which allows the models to learn complex features in an effective manner
whereas the inception blocks capture the multiscale fine and detail features because of
the parallel Conv blocks with varying filter sizes. The fusion of both techniques provides
the powerful structure of the DL model, enabling it to efficiently and effectively learn the
diverse features and enhancing its ability to discriminate between different arrhythmia
types, resulting in improved accuracy and robustness compared to other models.
The second section is connected with the first and further refines the extracted features,
subsequently concatenating them. The third section is connected with both preceding
sections (first and second) and combines the concatenated features. The output of the final
section is given to the dense layer of the first section to produce the desired output. A
detailed description of each section and its constituent blocks follows. The first section of
the proposed model consists of the following layers:
The proposed model is a deep learning model that consists of the following layers:
• 1D convolutional (Conv) layers: In our proposed model, we have utilized multiple 1D-
Convolution (Conv) layers for extracting high level features from the provided dataset.
The first 1D-Conv layer employs 512 kernels, each of size 5 × 5, to learn informative
patterns from the input data. The second 1D-Conv layer includes 256 kernels of
size 3 × 3, further refining the extracted features. Following each 1D-Conv layer,
we incorporate batch normalization (batchNorm) to improve training stability and
accelerate convergence. Further, ReLU activation functions are included to introduce
non-linearity and improve the model’s ability to learn complex relationships within
the data. The convolutional layer computes the output Output[i, j, k] at spatial position
(i, j) and output channel k as given in Equation (2):

Output[i, j, k] = ∑ (Filter[ f , g, h] × Input[i + f , j + g, h]) + Bias[k] (2)


f ,g,h
Algorithms 2024, 17, 364 10 of 28

• 1D max-pooling (MaxPool) layer: This layer is utilized to downsample the data while
preserving prominent features. The 1D-maxPool layer employs a 3 × 3 size filter with
stride 2, and it computes the output Output[i ] at position i as given in Equation (3):

Output[i ] = max(Input[ j × stride : ( j × stride) + pool_size − 1]) (3)

• Residual block: This block is used to address the vanishing gradient problem and
facilitate weight transfer. The residual block consists of three stacks, each comprising a
1D-Conv layer, batchNorm, and Leaky ReLU activation function with an alpha value
of 1 × 10−2 , which are given by Equations (4)–(6), respectively:

Yconv = σ (W × X + b) (4)

Yconv − µB
 
Ybn = γ +β (5)
σB
Yout = max(α · Ybn , Ybn ) (6)
where W is the filter weights, X is the input data, b is the bias term, and σ is the
activation function (Leaky ReLU). γ and β are the scaling and shifting parameters, µB
and σB are the batch mean and standard deviation. α is the leakiness factor (0.01 in
this case).
The convolutional layer sizes for each stack are 128, 128, and 256, respectively, with a
kernel size of 1 × 1, as shown by Equations (7)–(9) for each stack, respectively.
(1) First Stack:

σ(W1 × X + b1 ) − µB1 σ(W1 × X + b1 ) − µB1


       
Ystack1 = max α · γ1 + β1 , γ1 + β1 (7)
σB1 σB1
(2) Second Stack:

σ(W2 × Y1 + b2 ) − µB2 σ (W2 × Y1 + b2 ) − µB2


       
Ystack2 = max α · γ2 + β 2 , γ2 + β2 (8)
σB2 σB2
(3) Third Stack:

σ(W3 × Y2 + b3 ) − µB3 σ (W3 × Y2 + b3 ) − µB3


       
Ystack3 = max α · γ3 + β 3 , γ3 + β3 (9)
σB3 σB3
To maintain weight preservation, an additional convolutional layer with 256 filters
and batch normalization is incorporated into the skip connection, linking it with the output
of the third stack in the residual block, which is shown by Equations (10) and (11):

skip = Conv1D256 ( X ) (10)

F ( X ) = Ystack3 + Conv1D256 ( X ) (11)


• Inception block: This block is used to extract further low-dimensional features. The
inception block involves stacks of 1D convolutional layers, followed by batch normal-
ization and Leaky ReLU activation with an alpha value of 1 × 10−2 . Each stack utilizes
64 filters, with kernel sizes of 1, 3, and 5.
(4) Kernel Size 1:

   (1) (1)

(1)
 
(1)
σ W1 × X + b1 − µ B1
 
Yout1 = maxα · γ1   + β (1) , σ W (1) × X + b (1)  (12)
(1) 1 1 1
σB1
Algorithms 2024, 17, 364 11 of 28

(5) Kernel Size 3:

   (1) (1)

(1)
 
(1)
σ W3 × X + b3 − µ B3
 
Yout3 = maxα · γ3   + β (1) , σ W (1) × X + b (1)  (13)
(1) 3 3 3
σB3

(6) Kernel Size 5:

   (1) (1)

(1)
 
(1)
σ W5 × X + b5 − µ B5
 
Yout5 = maxα · γ5   + β(1) , σ W (1) × X + b(1)  (14)
(1) 5 5 5
σB5

Equations (12)–(14) illustrate how each stack within the inception block handles the
input data ( X ). The max operation integrates the Leaky ReLU output with a scaled and
shifted variant to ensure non-linearity and extraction of features across various receptive
fields (kernel sizes). The second and third sections contain almost identical layers, a repeti-
tive structure of convolution, batch normalization, and Leaky ReLU to progressively extract
increasingly detailed and refined features. The second section combines by concatenating
the extracted features from its blocks with those from the first section. The third section
follows a similar set of layers but incorporates skip connections to facilitate the flow of
information across layers.
• Convolutional blocks: These blocks are used to capture complex patterns within
the data. Each convolutional block consists of a 1D-Conv layer, batchNorm, and
Leaky ReLU activation with an alpha value of 1 × 10−2 . The first convolutional block
uses 128 filters, a filter size of 5 × 5, and a stride of 1 × 1, complemented by instance
normalization and parametric ReLU activation. A dropout layer with a rate of 20% and
a 1D max-pooling layer with a filter size of 2 × 2 was added. The second convolutional
block is similar to the first, except for the filter count in the convolutional layer. It
employs 256 filters of size 11 × 11 and the third convolutional block omits the 1D
pooling layer and utilizes a Conv layer with 512 filters of size 21 × 21;
• 1D global average pooling (Global Avg. Pool) layer: This layer is utilized mainly for
reducing the dimensionality of the feature data, presented by Equation (15).

1 N
N i∑
Yglobal_avg_pool = Xi (15)
=1
where feature numbers are represented by N and Xi are the input features;
• Dense layer: This layer is used to classify the data. The dense layer has 27 neurons,
and each neuron is activated using a softmax function, the output probabilities being
given by Equation (16):

N
Zk = ∑ Wki · Xi + bk
i =1
e Zk (16)
Yk = Zj
for k = 1, 2, . . . , 27
∑27
j =1 e

where Zk is the class logit, Wki is the weight from input i to output k, Xi is input, bk is the
bias for k, Yk is the softmax output ensuring probabilities sum to 1 for predictions.
The proposed (HRIDM) model is an effective method for the classification of time
series data. The utilized model is capable of extracting high-level features from the data,
and it is able to capture complex patterns within the data. The model is also able to
generalize well to new data.
Algorithms 2024, 17, 364 12 of 28
Algorithms 2024, 17, 364 11 of 29

Figure 5. Building blocks of proposed hybrid residual/inception-based deeper model (HRIDM).


Figure 5. Building blocks of proposed hybrid residual/inception-based deeper model (HRIDM).

3.5. Activation Functions


The activation functions also have a very essential responsibility in designing deep
learning models. The selection of the activation function depends upon the types of input
data and the category of classifications. In this work, we utilized Leaky ReLU and ReLU
activation functions, and we compared these with the most commonly used activation
functions, sigmoid and tanh, as visualized in Figure 6.
3.5. Activation Functions
The activation functions also have a very essential responsibility in designing deep
learning models. The selection of the activation function depends upon the types of input
data and the category of classifications. In this work, we utilized Leaky ReLU and ReLU
Algorithms 2024, 17, 364 activation functions, and we compared these with the most commonly used activation 13 of 28
functions, sigmoid and tanh, as visualized in Figure 6.

Figure
Figure 6.
6. Comparison
Comparisonofofoutput
outputresponses
responsesforforReLU,
ReLU,Leaky ReLU,
Leaky sigmoid,
ReLU, andand
sigmoid, tanh activation
tanh activa-
functions.
tion functions.

•• Leaky
Leaky ReLU:
ReLU: We
We employed
employed the
the Leaky
Leaky ReLU
ReLU activation
activation function
function for
forthe
thepresent
presentstudy,
study,
which offers
which offers strong
strong benefits
benefits for
for ECG
ECG classification.
classification. It
It is
is important
important toto analyze
analyze several
several
activation
activation functions
functions that
that are particular to
are particular to our
our task
task and
and dataset.
dataset. Negative
Negative values
values in
in
ECG signals are frequently indicative of certain types of cardiac activity. Leaky ReLU
ensures that neurons continue to contribute to learning features from the data by
keeping them from going into inactive states as a result of these negative inputs. Leaky
ReLU is computationally more efficient than tanh and sigmoid, which is advantageous
for training deeper and bigger neural networks using ECG data [47,48]. Leaky ReLU,
in contrast to ReLU, keeps a little non-zero gradient for negative inputs. This feature
might be useful in applications where it is important to detect even minute deviations
from normal cardiac rhythm in order to capture minor changes in ECG patterns. Leaky
ReLU can be mathematically expressed by Equation (17) [48]:

x, x≥0
f (x) = (17)
αx, otherwise
• ReLU (Rectified Linear Unit): This is simple and effective in terms of computing; it
promotes sparsity by generating a large number of zero outputs. In comparison with
sigmoid and tanh, it allows models to converge more quickly during training. But
it has the “Dying ReLU” issue, which prevents learning by allowing neurons with
negative inputs to be stuck at zero indefinitely. The mathematical definition of the
ReLU activation function is given by Equation (18) [48,49]:

x, x≥0
f (x) = (18)
0, x<0
• Sigmoid: The sigmoid function reduces input values to the range [0, 1]. It is especially
effective for binary classification problems that require probabilistic outputs. The
function has a smooth gradient over its full range, allowing for effective gradient-
based optimization during training. The sigmoid activation may be mathematically
described as illustrated in Equation (19) [49]:

1
f (x) = (19)
1 + e− x
• Tanh (Hyperbolic Tangent): The tanh function converts input values to the range
[−1, 1]. Similar to the sigmoid function, its output is zero-centered, which can help
Algorithms 2024, 17, 364 14 of 28

neural networks converge. The tanh activation function is mathematically stated as


follows in Equation (20) [48]:

e x − e− x
f (x) = (20)
e x + e− x

3.6. Evaluation Metrics


This section presents an overview of our proposed model and approach for identi-
fying anomalies in ECG data. To understand the functioning of our model, we extract
characteristics and employ several metrics that provide insights into its performance. In
this research, we utilized the following metrics:

TP + TN TP + TN
Accuracy(Acc) = = (21)
TP + TN + FP + FN Total No. o f ECG samples

TP
Precision(Pre) = (22)
TP + FP
TP
Recall(Rec) = (23)
TP + FN
2 · ( Pre · Rec)
F1 − Score = (24)
( Pre + Rec)
The Area Under the Receiver Operating Characteristic Curve (AUC) measure is used to
see the model’s performance graphically across different classification levels. An increased
AUC score (closer to 1) denotes better performance. Plotting recall versus specificity at
different threshold values is what the ROC curve does. When comparing a model’s perfor-
mance to a single criterion (such as accuracy), AUC offers a more thorough assessment.
The total number of ECG samples in the test data is equal to the sum of the number
of positive and negative samples ( TP + TN + FP + FN ). The confusion matrix is the
foundation for assessing classification models. For a specific task, this matrix carefully
counts the number of accurate identifications (TN, True negatives; TP, True positives) and
inaccurate detections (FN, False negatives; FP, False positives). When there are significant
impacts for missing positive cases, the confusion matrix becomes very important [50].
A high recall value implies that, even at cost of a higher level of false positives, the
model reduces the possibility of missing significant instances. This trade-off is important,
especially when the value of incorrectly detecting an instance that is negative is greater
than the cost of missing a positive one [38,51].

4. Results
This section provides a detailed outline of the experiments performed and their
corresponding outcome. The experimental setup involved implementing the experiments
primarily using Python, with most of the computations performed on computer system
with 16 GB Ram, utilizing a Tesla T4 GPU. To optimize model performance, we utilized
a dataset comprising 43,101 training samples. This dataset was then divided into 90%
training (38,790 samples) and 10% validation (4311 samples) sets to facilitate model training
and evaluation. For testing, two categories of datasets were provided: 6630 known and
disclosed records, and 10,000 unknown and undisclosed records, totaling 16,630. However,
for this work, we utilized the undisclosed hidden dataset of 10,000 recordings provided by
the CinC 2020 organizers to ensure unbiased testing of the proposed model. Throughout
the experiment, a number of libraries were used: for data visualization, Matplotlib, Seaborn,
and Ecg-plot; for data processing, NumPy and Pandas; and for modeling, TensorFlow and
Keras. For the assessment of the models, Sklearn was used, along with other libraries like
SciPy and WFDB. The following hyperparameters were used for training all the DL models
including the proposed model: an Adam optimizer, a batch size of 32, min_delta: 0.0001, a
dropout rate of 0.2, and a filter size of 5 × 5. A learning rate decay mechanism was used as
Algorithms 2024, 17, 364 15 of 28

a callback function depending on the AUC score during the model’s training. The learning
rate was degraded by multiplying it by 0.1 in the optimizer if the AUC score did not show
improvement every epoch, suggesting a lack of convergence.
We showcased the experimental outcomes derived from the models employed in this
investigation. LeNet, AlexNet, VGG-16, ResNet-50, Inception, and LSTM are among the
models. To ensure a fair comparison with the proposed HRIDM model, all SOTA models
were trained from scratch on the same dataset rather than utilizing pre-trained weights.
This approach allowed for a direct evaluation of model performance under identical condi-
tions. The experimental setup previously described was used for training and evaluating
these models on the dataset. The outcomes offer valuable perspectives on how well each
model performs in appropriately classifying cardiac disorders. To evaluate the models’
performance, evaluation criteria including accuracy, precision, recall, and F1-score were
used. To further illustrate the trade-off between the true positive rate and false positive rate
for various categorization levels, ROC curves were calculated. The experimental results
provide insight into how well each model detects and classifies different heart problems. In
order to satisfy the research objectives, we analyzed each model’s performance, compared
the outcomes, and expressed the advantages and disadvantages of each.

4.1. LeNet-5
Algorithms 2024, 17, 364 The first model we trained on our train and validation dataset was LeNet-5.17Figure of 29 2
displays the model’s performance as training history curves. Figure 7a illustrates the accu-
racy, precision, and AUC metrics for both the training and validation datasets. Figure 7b
depicts
obtainedtheanloss curveof
accuracy ofroughly
the model. The
70% on thetraining
trainingcurves
data butshow
only that
60% the LeNet-5
on the model
validation
obtained an accuracy
data. A similar of roughly
tendency may be seen70%using
on the thetraining dataOn
AUC score. butboth
only 60% onthe
datasets, theprecision
validation
scoreAremained
data. stable at may
similar tendency roughly 50%.using
be seen We confined
the AUC the training
score. iterations
On both to 20the
datasets, epochs in
precision
orderremained
score to observe the model’s
stable learning
at roughly 50%.curve. This confined
We confined trainingiterations
the training undoubtedly to 20led to
epochs
inthe model’s
order underfitting,
to observe as seenlearning
the model’s by the declining precision
curve. This confinedneartraining
the endundoubtedly
of the epochs.led
toFurthermore, the loss curveas
the model’s underfitting, exhibits
seen bya the
rather smoothprecision
declining fall, with near
values
theofend
about 20%epochs.
of the for
both training and validation data. While a falling loss curve is ideal, the absence
Furthermore, the loss curve exhibits a rather smooth fall, with values of about 20% for both of con-
siderable
training decrease
and in this
validation data.case implies
While that the
a falling lossmodel
curveisisnot efficiently
ideal, reducing
the absence the cost
of considerable
function.
decrease in this case implies that the model is not efficiently reducing the cost function.

(a) (b)
Figure 7. Training history curve of LeNet-5 model on training and validation data. (a) Accuracy,
Figure 7. Training history curve of LeNet-5 model on training and validation data. (a) Accuracy,
precision, and AUC. (b) Loss.
precision, and AUC. (b) Loss.
4.2.AlexNet
4.2. AlexNet

InInour
oursecond
second experiment,
experiment, we
weexamined
examinedthe AlexNet
the model’s
AlexNet training
model’s history
training withwith
history a
particular emphasis on AUC, accuracy, loss, and precision (Figure 8). In comparison to
a particular emphasis on AUC, accuracy, loss, and precision (Figure 8). In comparison to
the LeNet-5 model, AlexNet performed noticeably better. Each epoch exhibited a steady
the LeNet-5 model, AlexNet performed noticeably better. Each epoch exhibited a steady
decline in the training and validation loss curves, suggesting good convergence towards
decline in the training and validation loss curves, suggesting good convergence towards the
the minimal loss value. Furthermore, the model consistently produced a precision of
minimal loss value. Furthermore, the model consistently produced a precision of around
around 60% for both sets of data, and it demonstrated a stable and impressive accuracy of
about 80% on both training and validation sets. Promising AUC values of around 70%
during training are also shown in Figure 8a.
4.2. AlexNet
In our second experiment, we examined the AlexNet model’s training history with a
particular emphasis on AUC, accuracy, loss, and precision (Figure 8). In comparison to
Algorithms 2024, 17, 364 the LeNet-5 model, AlexNet performed noticeably better. Each epoch exhibited a steady
16 of 28
decline in the training and validation loss curves, suggesting good convergence towards
the minimal loss value. Furthermore, the model consistently produced a precision of
around 60% for both sets of data, and it demonstrated a stable and impressive accuracy of
60% for80%
about bothonsets of training
both data, and it demonstrated
and validation sets.aPromising
stable andAUC
impressive accuracy
values of aroundof about
70%
80% on both training and validation sets.
during training are also shown in Figure 8a. Promising AUC values of around 70% during
training are also shown in Figure 8a.

Algorithms 2024, 17, 364 18 of 29

The training loss curve in Figure 8b, on the other hand, does not demonstrate the
same
(a) smooth drop as the validation loss curve, indicating(b) probable overfitting. This mis-
match suggests that the model is remembering the training data rather than generalizing
Figure 8. Training
effectively history data.
to unknown curve of AlexNet model theon training high
and validation data. (a) Accuracy,
Figure 8. Training history curveFurthermore,
of AlexNet model dataset’s
on training andvolatility
validation and imbalanced
data. (a) Accuracy,
precision, and
classes continueAUC. (b) Loss.
precision, and AUC.to(b) beLoss.
a concern. Addressing these data restrictions may improve the
AlexNet model’s performance and capacity to handle classification challenges success-
fully.The training loss curve in Figure 8b, on the other hand, does not demonstrate the
same smooth drop as the validation loss curve, indicating probable overfitting. This
4.3. VGG-16
mismatch suggests that the model is remembering the training data rather than generalizing
We trained
effectively the VGG-16
to unknown data.model on both the
Furthermore, the training
dataset’sandhighvalidation
volatility datasets in our
and imbalanced
third experiment,
classes continue to which
be aevaluated
concern. the model. Thethese
Addressing model’s accuracy,
data precision,
restrictions may AUC,improve and the
loss training
AlexNet history
model’s curves are and
performance displayed in Figure
capacity 9. With
to handle accuracy and
classification AUC averaging
challenges successfully.
80% throughout the training process, VGG-16 produced very smooth training curves. The
4.3. VGG-16
accuracy curve, however, stayed mostly stable. Despite the lack of gain in precision, VGG-
16 outperformed
We trained the preceding
VGG-16models.
model The model
on both theexhibited
trainingnot andonly consistent
validation and note-
datasets in our
worthy accuracy, but also outstanding accuracy of over 80% for both training
third experiment, which evaluated the model. The model’s accuracy, precision, AUC, and and valida-
tiontraining
loss data. These figures
history curvesdemonstrate that in
are displayed both accuracy
Figure 9. With and AUC were
accuracy around
and AUC 80%
averaging
throughout training (Figure 9a).
80% throughout the training process, VGG-16 produced very smooth training curves.
While thecurve,
The accuracy loss curve (Figure
however, 9b) shows
stayed mostly a smooth decline for
stable. Despite theboth
lacktraining
of gain and vali-
in precision,
dation data, more research into potential overfitting is needed. Similar
VGG-16 outperformed preceding models. The model exhibited not only consistent and to the preceding
models, the existence
noteworthy accuracy,ofbut high data
also volatility and
outstanding class imbalance
accuracy is likely
of over 80% for to impede
both trainingopti-and
mal performance. Addressing these data concerns might improve the VGG-16 model’s
validation data. These figures demonstrate that both accuracy and AUC were around 80%
ability to handle classification tasks.
throughout training (Figure 9a).

(a) (b)
Figure 9. Training history curve of VGG-16 model on training and validation data. (a) Accuracy,
Figure 9. Training history curve of VGG-16 model on training and validation data. (a) Accuracy,
precision, and AUC. (b) Loss.
precision, and AUC. (b) Loss.
4.4. ResNet-50
In this subsection, we examine the accuracy and loss observations during the training
of the ResNet-50 model, as depicted in Figure 10. The loss plot reveals that the ResNet
model exhibited better performance compared to previous approaches when applied to
validation data. However, it is important to note that the model faced challenges in mini-
Algorithms 2024, 17, 364 17 of 28

While the loss curve (Figure 9b) shows a smooth decline for both training and vali-
dation data, more research into potential overfitting is needed. Similar to the preceding
models, the existence of high data volatility and class imbalance is likely to impede optimal
performance. Addressing these data concerns might improve the VGG-16 model’s ability
to handle classification tasks.

4.4. ResNet-50
In this subsection, we examine the accuracy and loss observations during the training
of the ResNet-50 model, as depicted in Figure 10. The loss plot reveals that the ResNet
model exhibited better performance compared to previous approaches when applied to
validation data. However, it is important to note that the model faced challenges in
minimizing the parameters from the start, resulting in slow learning. This was likely due
to the large size of the model, which had 23.5 million parameters. The dataset contained
27 classes, with one particular class having a disproportionately large amount of training
data. This imbalance led to misclassifications, as the model erroneously classified each
Algorithms 2024, 17, 364 19 of 29
validation data point as belonging to that specific class. Nevertheless, the training plot
demonstrates that the model made progress over the course of several epochs.

(a) (b)
FigureFigure
10. Training historyhistory
10. Training curve of ResNet-50
curve model model
of ResNet-50 on training and validation
on training data. (a)
and validation Accuracy,
data. (a) Accuracy,
precision, and AUC. (b) Loss.
precision, and AUC. (b) Loss.

Despite constraints,
Despite the model
constraints, demonstrated
the model progress
demonstrated over epochs
progress duringduring
over epochs training,
training,
achieved accuracy value around 85% and the AUC around 81%. Nonetheless,
achieved accuracy value around 85% and the AUC around 81%. Nonetheless, precision precision
fluctuated and stayed
fluctuated at a low
and stayed at level
a low(around 50%). This
level (around 50%).shows
This that,
showsperhaps as a result
that, perhaps as of
a result
the data variation,
of the the modelthe
data variation, has difficulties
model correctly predicting
has difficulties some classes.
correctly predicting some These met-These
classes.
rics suggest
metricsthat the model
suggest that thewas
model ablewasto learn tolearn
able to distinguish between
to distinguish different
between classesclasses
different to to
some some
extent.extent.
However, there is still room for improvement, as the model’s performance
However, there is still room for improvement, as the model’s performance
suffered due todue
suffered theto significant variance
the significant present
variance in thein
present data. Overall,
the data. the model
Overall, exhibited
the model exhibited
misclassifications, highlighting
misclassifications, the need
highlighting the for further
need refinement.
for further refinement.

4.5. Inception
4.5. Inception Network Network
In theIn theexperiment,
fifth fifth experiment, we trained
we trained the Inception
the Inception model
model andand plotted
plotted thethe training
training his-history
in terms of accuracy, precision, AUC, and loss, as shown in Figure
tory in terms of accuracy, precision, AUC, and loss, as shown in Figure 11. These plots 11. These plots provide
providea comprehensive
a comprehensive overview
overview of the recorded
of the recordedtraining metrics
training metricsfor each epoch.
for each TheThe
epoch. accuracy
plotplot
accuracy reveals that the
reveals thatmodel exhibited
the model commendable
exhibited commendableperformance duringduring
performance both the training
both
the training and validation phases, with accuracy scores surpassing 95%. Notably, its pre-on the
and validation phases, with accuracy scores surpassing 95%. Notably, its precision
cisionvalidation and training
on the validation and sets fluctuated
training due to thedue
sets fluctuated influence
to theof the learning
influence of therate. However, it
learning
rate. However, it is worth mentioning that the model’s training process was not optimal,by the
is worth mentioning that the model’s training process was not optimal, as indicated
similarby
as indicated performance
the similarobserved
performance on the validation
observed set.validation
on the This phenomenon
set. This can be attributed to
phenomenon
the excessive layering of the architecture, leading to a diminished
can be attributed to the excessive layering of the architecture, leading to a diminished number of features in
number theofoutput
features feature
in thematrix.
outputConsequently,
feature matrix.despite employing
Consequently, learning
despite decay, learn-
employing there was a
ing decay, there was a spike in the loss for the validation data, indicating a lack of conver-
gence compared to the gradual convergence observed for the training curves (Figure 11a).
accuracy plot reveals that the model exhibited commendable performance during both
the training and validation phases, with accuracy scores surpassing 95%. Notably, its pre-
cision on the validation and training sets fluctuated due to the influence of the learning
rate. However, it is worth mentioning that the model’s training process was not optimal,
Algorithms 2024, 17, 364 as indicated by the similar performance observed on the validation set. This phenomenon 18 of 28
can be attributed to the excessive layering of the architecture, leading to a diminished
number of features in the output feature matrix. Consequently, despite employing learn-
ing decay, there
spike in was a spike
the loss in the
for the loss for the
validation validation
data, data,
indicating indicating
a lack a lack ofcompared
of convergence conver- to the
gence compared to the gradual
gradual convergence convergence
observed observed
for the trainingfor the training
curves (Figure curves
11a). (Figure 11a).

Algorithms 2024, 17, 364 20 of 29

While the precision of validation reached 74% and validation AUC exceeded 70%,
these measures did not show significant improvements on the training data, adding to the
likelihood of overfitting. The positive aspect is that the loss curves (Figure 11b) for both
training and validation data converged smoothly, confirming the model’s overall learning
capability. In this case, the precision score on the training dataset showed a good score.
These metrics suggest that the model was able to learn to distinguish between different
classes to some extent. However, there is still room for improvement, as the model’s per-
(a)
formance suffered due to the significant variance present (b)in the data. Overall, the model
exhibitedhistory
Figure 11. Training misclassifications, highlighting
curve of Inception the on
Network need for further
training refinement.data. (a) Accu-
and validation
Figure 11. Training history curve of Inception Network on training and validation data. (a) Accuracy,
racy, precision, and AUC. (b) Loss.
precision, and AUC. (b) Loss.
4.6. LSTM
Our sixth experiment explored the Long Short-Term Memory (LSTM) model, which
While the precision of validation reached 74% and validation AUC exceeded 70%,
is ideal for dealing with sequential data. Figure 12 depicts the training curves for both the
these measures
training did not show
and validation significant
datasets. improvements
While LSTMs excel withon the training
sequential data,data, adding to the
our model’s
likelihood
performance deteriorated owing to dataset limitations. The considerable volatility11b)
of overfitting. The positive aspect is that the loss curves (Figure for both
in data
training and validation data converged smoothly, confirming the
distribution, as well as the imbalanced class representation, caused challenges. model’s overall learning
capability. In thismost
The model case,likely
the precision score
ignored other on the
classes training
in favor dataset showed
of predicting a good
the one that pre- score.
These metricsinsuggest
dominated that thedataset.
the imbalanced modelThiswasemphasizes
able to learn
howto distinguish
crucial between
it is to deal different
with data
imbalances prior to training subsequent models. The data variance
classes to some extent. However, there is still room for improvement, as the model’s prevented the cost
function from
performance fully converging,
suffered even if the graphs
due to the significant varianceshowed thatinitthe
present wasdata.
heading in thethe
Overall, di-model
rection of the minimum. Comparing this to earlier versions,
exhibited misclassifications, highlighting the need for further refinement.the performance was not as
good. While the training accuracy increased to 70–75%, other parameters, such as preci-
4.6.sion
LSTMand AUC, continued to decline, averaging 58–64% (Figure 12a). There were notable
fluctuations (20–30%) in the loss values as well (Figure 12b). The difficulties in using
Our sixth experiment explored the Long Short-Term Memory (LSTM) model, which is
LSTMs on imbalanced, highly variable datasets are highlighted by these findings. Alt-
ideal
houghfor dealing
the LSTMwith model sequential
works well data.
withFigure 12 depicts
sequential data, ourthe training
dataset’s curves for
constraints madeboth the
training and validation
it less effective datasets. Improving
in this experiment. While LSTMs excel withofsequential
the effectiveness LSTMs in this data, our model’s
particular
performance
situation may deteriorated owing todata
require addressing dataset limitations.
imbalance The considerable
and maybe volatilitytoin data
investigating methods
distribution,
handle dataas well as the imbalanced class representation, caused challenges.
volatility.

(a) (b)
Figure 12. Training history curve of LSTM model on training and validation data. (a) Accuracy,
Figure 12. Training history curve of LSTM model on training and validation data. (a) Accuracy,
precision, and AUC. (b) Loss.
precision, and AUC. (b) Loss.
4.7. Proposed Model (HRIDM)
The model most likely ignored other classes in favor of predicting the one that pre-
In the final experiment, our proposed HRIDM model was trained for 20 epochs, sim-
dominated in the imbalanced dataset. This emphasizes how crucial it is to deal with
ilar to the other models. As seen in Figure 13, the model showed consistently high accu-
racy, with both training and validation accuracy settling between 95 and 97%. This demon-
strates high learning capacity, as the model accurately caught the patterns in the training
data and generalized well to previously encountered data in the validation set. The
Algorithms 2024, 17, 364 19 of 28

data imbalances prior to training subsequent models. The data variance prevented the
cost function from fully converging, even if the graphs showed that it was heading in
the direction of the minimum. Comparing this to earlier versions, the performance was
not as good. While the training accuracy increased to 70–75%, other parameters, such
as precision and AUC, continued to decline, averaging 58–64% (Figure 12a). There were
notable fluctuations (20–30%) in the loss values as well (Figure 12b). The difficulties in
using LSTMs on imbalanced, highly variable datasets are highlighted by these findings.
Although the LSTM model works well with sequential data, our dataset’s constraints made
it less effective in this experiment. Improving the effectiveness of LSTMs in this particular
situation may require addressing data imbalance and maybe investigating methods to
handle data volatility.

4.7. Proposed Model (HRIDM)


In the final experiment, our proposed HRIDM model was trained for 20 epochs,
similar to the other models. As seen in Figure 13, the model showed consistently high
Algorithms 2024, 17, 364 accuracy, with both training and validation accuracy settling between 95 and 97%.
21 of 29 This
demonstrates high learning capacity, as the model accurately caught the patterns in the
training data and generalized well to previously encountered data in the validation set. The
precision graphs
precision graphssupport
support the model’slearning
the model’s learning progress.
progress. Training
Training precision
precision was roughly
was roughly
80%, whereas
80%, whereasvalidation
validation precision wasaround
precision was around 70%
70% (Figure
(Figure 13a).13a).
TheseThese findings
findings indicate
indicate
the HRIDM model’s capacity to achieve excellent accuracy and precision on both
the HRIDM model’s capacity to achieve excellent accuracy and precision on both trainingtraining
andand validationdatasets.
validation datasets.

(a) (b)
Figure 13. Training history curve of proposed model on training and validation data. (a) Accuracy,
Figure 13. Training history curve of proposed model on training and validation data. (a) Accuracy,
precision, and AUC. (b) Loss.
precision, and AUC. (b) Loss.
Figure 13b depicts the decreasing trend in both training and validation loss across
theFigure
training13b depicts the
procedure. Thisdecreasing
represents atrend in bothconvergence
satisfactory training and validation
to the loss across
ideal solution
the(global
training procedure. This represents a satisfactory convergence
minimum). Over the course of 20 epochs, the training loss dropped from above to the ideal solution
(global
0.15 tominimum).
less than 0.090.Over
Thethe course loss
validation of 20 epochs,a similar
followed the training
pattern,loss dropped
beginning over from
0.14 above
0.15
andtodecreasing
less than to 0.090. The0.085.
less than validation lossvalues
Lower loss followed
suggesta similar
that the pattern, beginning
model is better able over
to and
0.14 predict the targetto
decreasing variable (abnormality
less than in ECG
0.085. Lower data).
loss Finally,
values the AUC
suggest thatvalues for both
the model is better
training and validation data remained around 70%, with an increasing
able to predict the target variable (abnormality in ECG data). Finally, the AUC values trend noted near
forthe endtraining
both of training.
and This indicates the
validation datamodel’s efficacy
remained in terms70%,
around of accuracy,
with anrecall (as rep- trend
increasing
resented by AUC), and overall classification ability. These combined findings
noted near the end of training. This indicates the model’s efficacy in terms of accuracy, demonstrate
the HRIDM model’s effectiveness. The model improved its ability to detect anomalies in
recall (as represented by AUC), and overall classification ability. These combined findings
the big ECG dataset by including both inception and residual attributes in the model’s
demonstrate the HRIDM model’s effectiveness. The model improved its ability to detect
structure.
anomalies in the big ECG dataset by including both inception and residual attributes in the
After training the models, we tested them to find their performance on the unseen
model’s structure.
dataset, shown in Table 2. This section provides a comprehensive analysis of the results
After training
obtained from all thetheimplemented
models, wemodels,
tested them to find
including their performance
the proposed on theinunseen
model. As shown
dataset,
Figureshown in Table model
14, the proposed 2. Thisachieved
sectionsignificantly
provides ahigher
comprehensive
accuracy than analysis
previous ofmod-
the results
obtained from all
els, achieving thescore
a test implemented
of 50.87%. Themodels, including
Inception networkthe proposed
emerged model.
as the As shown
second-best
model, with a test score of 40.6%, outperforming most of the models mentioned in the
literature. On the other hand, ResNet-50, VGG-16, and AlexNet demonstrated comparable
performance, with test scores ranging from 33.8% to 35.1%. However, LeNet-5 and LSTM
did not meet the performance standards. This is likely due to the highly imbalanced class
distribution in the data, making it difficult for these models to learn complex patterns.
Algorithms 2024, 17, 364 20 of 28

in Figure 14, the proposed model achieved significantly higher accuracy than previous
models, achieving a test score of 50.87%. The Inception network emerged as the second-best
model, with a test score of 40.6%, outperforming most of the models mentioned in the
literature. On the other hand, ResNet-50, VGG-16, and AlexNet demonstrated comparable
performance, with test scores ranging from 33.8% to 35.1%. However, LeNet-5 and LSTM
did not meet the performance standards. This is likely due to the highly imbalanced class
distribution in the data, making it difficult for these models to learn complex patterns.

Table 2. Testing accuracy achieved by multiple SOTA models.

Model Test Score


LeNet-5 15.7%
AlexNet 33.8%
VGG16 34.9%
ResNet50 35.1%
Inception 40.6%
22 of 29
LSTM 17.9%
Proposed (HRIDM) 50.87%

Figure 14. Accuracy comparison of SOTA


Figure 14. Accuracy vs. proposed
comparison model
of SOTA vs. on test
proposed dataset.
model on test dataset.

To provide a visual representation of the results, Figure 15 presents the confusion


To provide a visual representation of the results, Figure 15 presents the confusion
matrices. The X-axis of each matrix represents the actual values, while the Y-axis represents
matrices. The X-axistheofpredicted
each matrix represents
values of our model. the actual values,
The obtained challengewhile the
score for Y-axis
our model,repre-
using the
sents the predicted values of our model. The obtained challenge score for our model, using
evaluation matrix of the challenge site, was 0.50897. The confusion matrices demonstrate
the of
the evaluation matrix model’s attempt to classify
the challenge each class
site, was label despite
0.50897. the highly imbalanced
The confusion matricesdata. Among
demon-
SOTA models, our proposed model outperformed others, exhibiting favorable performance
strate the model’s attempt
compared totoclassify
previous each classmentioned
approaches label despitein the the highly
literature imbalanced data.
review.
Among SOTA models, our proposed model outperformed others, exhibiting favorable
performance compared to previous approaches mentioned in the literature review.
matrices. The X-axis of each matrix represents the actual values, while the Y-axis repre-
sents the predicted values of our model. The obtained challenge score for our model, using
the evaluation matrix of the challenge site, was 0.50897. The confusion matrices demon-
strate the model’s attempt to classify each class label despite the highly imbalanced data.
Algorithms 2024, 17, 364 Among SOTA models, our proposed model outperformed others, exhibiting favorable 21 of 28
performance compared to previous approaches mentioned in the literature review.

Figure
Figure 15.
15. Confusion
Confusion matrix
matrix of
of the
the proposed
proposed model
model with
with SNOMED CT classes
SNOMED CT classes on
on the
the test
test dataset.
dataset.

5. Discussion
In this section, we investigated various research papers published on the CinC 2020:
Program website [52] and IEEE Xplore and utilized the PhysioNet/CinC Challenge 2020
dataset to effectively classify abnormalities in ECG signal data (Table 3). The objective was
to develop a model capable of learning complex patterns within ECG signals and accurately
distinguishing between 27 different abnormalities.

Table 3. Comparative analysis of the proposed HRIDM model with the state-of-the-art methods for
12-lead ECG abnormality detection on the PhysioNet/CinC Challenge 2020 dataset.

Noise Reduction Additional


Reference Model Feature Test Score
or Augmentation Approach
Residual CNN-GRU
with attention Polyphase Filter
[18] NA 24 types, 75 epochs 12.2%
mechanism Resampling
(RCNN-GRU-AM)
Improved residual
[19] NA NA Ensemble Method 20.8%
network (iResNet)
Modified ResNet Type Bayesian
[20] Convolutional Neural NA NA Threshold 20.2%
Network (MR-CNN) Optimization
Evaluation Metrics
based on weights
[21] SE-ResNet34 NA 1D-CNN 35.9%
assigned to
different classes
Algorithms 2024, 17, 364 22 of 28

Table 3. Cont.

Noise Reduction Additional


Reference Model Feature Test Score
or Augmentation Approach
Hand-Designed
Recurrent Utilized Second
[22] NA NA 38.2%
Convolutional Neural Neural Model
Network (RCNN)
Squeeze-and-
excitation networks for multi-scale
[23] Data augmentation NA 41.1%
ECG classification features extracted
(SE-ECGNet)
Pre-trained on a
Exponentially Dilated
physician-
Causal Convolutional
[24] annotated dataset NA Ensemble Method 41.7%
Neural Network
of 254,044 12-lead
(ED-CCNN)
ECGs.
18-Layer Residual
Post-Training
[25] Convolutional Neural Notch filters NA 42%
Refinements
Network (ResNet-18)
Adversarial
Adversarial
Multi-Source Domain- Augmentation
[26] CNN and LSTM domain 43.7%
Generalization techniques
generalization
(AMDG)
Deep Residual Neural
24 types, trainable
Network with Scatter
[27] Augmentation NA layers in between 48.0%
Transform
scatter transforms
(DResNet-ST)
External
Ensembled SE-ResNet Open-Source Data,
[28] Wavelet denoising NA 51.4%
Model Rule-Based
Bradycardia Model
Modified ResNet with Constrained
a Squeeze-and- Grid-Search,
[29] NA NA 52.0%
Excitation Layer Bespoke Weighted
(MResNet-SE) Accuracy Metric
Wide and Deep Finite impulse
Handcrafted ECG Combination of
[30] Transformer Neural response bandpass 53.3%
Features Features
Network (WDTNN) filter
Proposed HRIDM NA NA NA 50.87%

Among all the state-of-the-art techniques, the highest test score obtained was 53.3%,
achieved by utilizing the Wide and Deep Transformer neural network model [30]. In this
work, the authors utilized an extensive feature engineering approach (Handcrafted ECG
Features) along with preprocessing techniques (finite impulse response bandpass filter).
The second highest official ranking, with a 52.0% test score, was obtained using Modified
ResNet with a Squeeze-and-Excitation Layer (MResNet-SE). This approach utilized zero
padding to extend the signal sample to 4096 samples and down sampling techniques to
sample the signal to 257 Hz [29]. In this study, the authors did not utilize any preprocessing
or feature extraction techniques, but they used additional approaches such as a constrained
grid-search for addressing data imbalance problems and a bespoke weighted accuracy
metric for evaluating model performance.
Another study [28] also utilized SE-Resnet and acquired a test score of 51.4% on the
undisclosed test dataset. In this work, the authors used squeeze-and-excitation blocks to
extract the fine details for 10 s to 30 s segments of the ECG signal and validated model
performance using an external open-source dataset. The authors also used wavelet-based
denoising techniques to remove the noise from the ECG dataset before training, but no
additional feature extraction techniques were utilized. All other methods had lower test
Algorithms 2024, 17, 364 23 of 28

scores compared to our proposed model. Among them, ref. [18] had the lowest score with
12.2%, but due to the correct evaluation strategy, they secured a place in the official ranking.
In this work, the authors used Polyphase Filter Resampling techniques for preprocessing of
the ECG data. No additional feature extraction techniques were used. With this approach,
they classified into 24 categories out of 27 and utilized 75 epochs for training.
Among all the reviewed studies, most authors [18,20,24–26], utilized CNNs in their
model design, either directly or indirectly. The second most utilized model was ResNet,
as employed by many authors [19,21,27–29], in their methodology, directly or indirectly.
The analysis concluded that all the top four studies, including ours, which secured the
first four places, utilized residual structure in the model design. Based on the observation
of preprocessing techniques such as denoising or data augmentation, it can be concluded
that the studies [19–22,29] did not utilize any of these techniques for preprocessing of
the ECG data. Among these studies, only [29] achieved a better score than proposed
technique. While studies [21,23,26,30] employed additional feature extraction techniques to
enhance the ECG abnormality classifications, only [30] achieved a better score as compared
to our proposed techniques. Additionally, only studies [19,20,22,29] did not utilize any
preprocessing or feature extraction techniques, although they employed some additional
approaches for result enhancements. Among them, only study [29] achieved a better score
as compared to our proposed methodology.
Although our proposed method achieved fourth place with a 50.87% test score as
compared with the official rankings, its strength lies in its efficiency and simplicity. Unlike
other studies, we did not employ any feature extraction, denoising, additional strategies, or
data augmentation in our proposed methodology. Due to the integration of residual and
inception blocks in our model architecture, its performance is exceptionally good without
relying on common preprocessing steps.
Among the state-of-the-art classifiers, the highest accuracy achieved was 40.6% using
the Inception network. However, the results obtained using these established architectures
did not yield satisfactory performance. The DNNs, LSTM, and LeNet-5 struggled to classify
beyond a single class. This limitation can be attributed to the imbalanced dataset and the
significant variance in data distribution. Consequently, these shorter networks with fewer
parameters failed to converge and could not effectively identify complex patterns within
the data. On the other hand, more recent and deeper architectures such as ResNet-50,
VGG-16, and AlexNet attempted to classify multiple classes but fell short of achieving the
desired results.
In contrast, our proposed hybrid residual/inception-based deeper model (HRIDM)
outperformed all the aforementioned architectures. Despite its superior performance, the
training time for our model was comparable to that of the Inception network. When
comparing the accuracy achieved by our model with previous research discussed in the
literature review, it surpassed many existing approaches. Only two models exhibited
significantly higher test accuracy. One of these models employed a transformer, which is
a relatively larger model that would require substantially more training time. The other
approach proposed a hybrid network comprising an SE-ResNet and a fully connected
network that incorporated age and gender as additional input features, which were later
integrated for classification.
Overall, our proposed approach demonstrated remarkable performance in comparison
to previous research and state-of-the-art architectures, achieving an impressive test score of
50.87%.

6. Conclusions
In this research, we addressed the crucial medical challenge of identifying heart dis-
eases using 12-lead ECG data. We presented a novel DL model (HRIDM) which integrated
two key components: residual blocks and inception blocks. This work utilized the official
PhysioNet/CinC Challenge 2020 dataset, which includes over 41,000 training data samples
and 27 categories of ECG abnormalities. We carefully tuned the hyperparameters for each
Algorithms 2024, 17, 364 24 of 28

block to achieve the best possible results. By allowing the network to identify complex pat-
terns in the inputs, the use of inception blocks increased performance while the addition of
residual blocks reduced the impact of the vanishing gradient issue. Compared to previous
investigations, our model significantly outperformed them with an excellent accuracy score
of 50.87% on the test dataset. We have also validated and compared the outcomes of our
proposed model with SOTA models and techniques. Our findings open up new avenues
for heart disease diagnosis research and demonstrate the promise of deep learning models
in the field of cardiology.

7. Future Work
There are several avenues for further exploration and extension of this research in
the future. Firstly, the application of data augmentation techniques can be employed to
address the issue of imbalanced datasets. By augmenting the existing data, we can achieve
a more balanced representation of each class, reducing the likelihood of misclassifications.
Additionally, incorporating demographic features such as age, gender, and sex into
the model architecture can lead to the development of a hybrid network. This hybrid
network can leverage these additional features in conjunction with the ECG data for more
accurate classification.

Author Contributions: Methodology, S.A.M.; Conceptualization, S.A.M. and H.M.R.; software,


H.M.R.; visualization, H.M.R.; writing—original draft preparation, S.A.M., H.M.R. and J.Y.; validation,
J.Y. and S.A.M., writing—review and editing, J.Y., H.M.R. and S.A.M.; supervision, J.Y. All authors
have read and agreed to the published version of the manuscript.
Funding: No funding was received for this work.
Data Availability Statement: The dataset utilized in this work is freely available on the official
PhysioNet/CinC Challenge 2020 website: https://physionet.org/content/challenge-2020/1.0.2/
(accessed on 29 July 2022).
Conflicts of Interest: There are no conflicts of interest present for this work.

Appendix A
This appendix contains two tables: Table A1, which provides the abbreviations used
in Figure 2, and Table A2, which offers a detailed description of the dataset.

Table A1. The Abbreviations corresponding to the Diagnosis and SNOMED CT codes used in the
utilized dataset.

Diagnosis SNOMED CT Codes Abbreviation


1st degree AV block 270492004 IAVB
Atrial fibrillation 164889003 AF
Atrial flutter 164890007 AFL
Bradycardia 426627000 Brady
Complete right bundle branch
713427006 CRBBB
block
Incomplete right bundle
713426002 IRBBB
branch block
Left anterior fascicular block 445118002 LAnFB
Left axis deviation 39732003 LAD
Left bundle branch block 164909002 LBBB
Algorithms 2024, 17, 364 25 of 28

Table A1. Cont.

Diagnosis SNOMED CT Codes Abbreviation


Low QRS voltages 251146004 LQRSV
Nonspecific intraventricular
698252002 NSIVCB
conduction disorder
Pacing rhythm 10370003 PR
Premature atrial contraction 284470004 PAC
Premature ventricular
427172004 PVC
contractions
Prolonged PR interval 164947007 LPR
Prolonged QT interval 111975006 LQT
Q wave abnormal 164917005 QAb
Right axis deviation 47665007 RAD
Right bundle branch block 59118001 RBBB
Sinus arrhythmia 427393009 SA
Sinus bradycardia 426177001 SB
Sinus rhythm 426783006 NSR
Sinus tachycardia 427084000 STach
Supraventricular premature
63593006 SVPB
beats
T wave abnormal 164934002 TAb
T wave inversion 59931005 TInv
Ventricular premature beats 17338001 VPB

Table A2. Number of recordings, mean duration of recordings, mean age of patients in recordings,
sex of patients in recordings, and sample frequency of recordings for each dataset.

Number of Mean Duration Sample Frequency


Dataset Mean Age (Years) Sex (Male/Female)
Recordings (Seconds) (Hz)
CPSC (all data) 13,256 16.2 61.1 53%/47% 500
CPSC Training 6877 15.9 60.2 54%/46% 500
CPSC-Extra
3453 15.9 63.7 53%/46% 500
Training
Hidden CPSC 2926 17.4 60.4 52%/48% 500
INCART 72 1800 56 54%/46% 257
PTB 516 110.8 56.3 73%/27% 1000
PTB-XL 21,837 10 59.8 52%/48% 500
G12EC (all data) 20,678 10 60.5 54%/46% 500
G12EC Training 10,344 10 60.5 54%/46% 500
Hidden G12EC 10,344 10 60.5 54%/46% 500
Undisclosed 10,000 10 63 53%/47% 300
Algorithms 2024, 17, 364 26 of 28

References
1. Dziadosz, D.; Daniłowicz-Szymanowicz, L.; Wejner-Mik, P.; Budnik, M.; Brzezińska, B.; Duchnowski, P.; Golińska-Grzybała, K.;
Jaworski, K.; Jedliński, I.; Kamela, M.; et al. What Do We Know So Far About Ventricular Arrhythmias and Sudden Cardiac Death
Prediction in the Mitral Valve Prolapse Population? Could Biomarkers Help Us Predict Their Occurrence? Curr. Cardiol. Rep.
2024, 26, 245–268. [CrossRef]
2. Santangelo, G.; Bursi, F.; Faggiano, A.; Moscardelli, S.; Simeoli, P.S.; Guazzi, M.; Lorusso, R.; Carugo, S.; Faggiano, P. The Global
Burden of Valvular Heart Disease: From Clinical Epidemiology to Management. J. Clin. Med. 2023, 12, 2178. [CrossRef] [PubMed]
3. Kim, S.Y.; Lee, J.-P.; Shin, W.-R.; Oh, I.-H.; Ahn, J.-Y.; Kim, Y.-H. Cardiac biomarkers and detection methods for myocardial
infarction. Mol. Cell. Toxicol. 2022, 18, 443–455. [CrossRef]
4. Rai, H.M.; Chatterjee, K.; Dashkevych, S. The prediction of cardiac abnormality and enhancement in minority class accuracy from
imbalanced ECG signals using modified deep neural network models. Comput. Biol. Med. 2022, 150, 106142. [CrossRef]
5. Kim, M.-G.; Choi, C.; Pan, S.B. Ensemble Networks for User Recognition in Various Situations Based on Electrocardiogram. IEEE
Access 2020, 8, 36527–36535. [CrossRef]
6. Gong, Z.; Tang, Z.; Qin, Z.; Su, X.; Choi, C. Electrocardiogram identification based on data generative network and non-fiducial
data processing. Comput. Biol. Med. 2024, 173, 108333. [CrossRef]
7. Rahman, A.-U.; Asif, R.N.; Sultan, K.; Alsaif, S.A.; Abbas, S.; Khan, M.A.; Mosavi, A. ECG Classification for Detecting ECG
Arrhythmia Empowered with Deep Learning Approaches. Comput. Intell. Neurosci. 2022, 2022, 6852845. [CrossRef]
8. Choi, G.; Ziyang, G.; Wu, J.; Esposito, C.; Choi, C. Multi-modal Biometrics Based Implicit Driver Identification System Using
Multi-TF Images of ECG and EMG. Comput. Biol. Med. 2023, 159, 106851. [CrossRef]
9. Zeng, Y.; Zhan, G. Extracting cervical spine popping sound during neck movement and analyzing its frequency using wavelet
transform. Comput. Biol. Med. 2022, 141, 105126. [CrossRef] [PubMed]
10. Asif, R.N.; Abbas, S.; Khan, M.A.; Rahman, A.U.; Sultan, K.; Mahmud, M.; Mosavi, A. Development and Validation of Embedded
Device for Electrocardiogram Arrhythmia Empowered with Transfer Learning. Comput. Intell. Neurosci. 2022, 2022, 5054641.
[CrossRef]
11. Asif, R.N.; Ditta, A.; Alquhayz, H.; Abbas, S.; Khan, M.A.; Ghazal, T.M.; Lee, S.-W. Detecting Electrocardiogram Arrhythmia
Empowered with Weighted Federated Learning. IEEE Access 2024, 12, 1909–1926. [CrossRef]
12. Kim, H.J.; Lim, J.S. Study on a Biometric Authentication Model based on ECG using a Fuzzy Neural Network. IOP Conf. Ser.
Mater. Sci. Eng. 2018, 317, 012030. [CrossRef]
13. Kim, Y.; Choi, G.; Choi, C. One-Dimensional Shallow Neural Network Using Non-Fiducial Based Segmented Electrocardiogram
for User Identification System. IEEE Access 2023, 11, 102483–102491. [CrossRef]
14. Islam, M.S.; Hasan, K.F.; Sultana, S.; Uddin, S.; Quinn, J.M.; Moni, M.A. HARDC: A novel ECG-based heartbeat classification
method to detect arrhythmia using hierarchical attention based dual structured RNN with dilated CNN. Neural Netw. 2023, 162,
271–287. [CrossRef] [PubMed]
15. Hammad, M.; Pławiak, P.; Wang, K.; Acharya, U.R. ResNet-Attention model for human authentication using ECG signals. Expert
Syst. 2021, 38, e12547. [CrossRef]
16. Rai, H.M.; Chatterjee, K. Hybrid CNN-LSTM deep learning model and ensemble technique for automatic detection of myocardial
infarction using big ECG data. Appl. Intell. 2022, 52, 5366–5384. [CrossRef]
17. HRai, M.; Chatterjee, K. 2D MRI image analysis and brain tumor detection using deep learning CNN model LeU-Net. Multimedia
Tools Appl. 2021, 80, 36111–36141. [CrossRef]
18. Nejedly, P.; Ivora, A.; Viscor, I.; Halamek, J.; Jurak, P.; Plesinger, F. Utilization of Residual CNN-GRU with Attention Mechanism
for Classification of 12-lead ECG. In Proceedings of the 2020 Computing in Cardiology, Rimini, Italy, 13–16 September 2020.
[CrossRef]
19. Yang, S.; Xiang, H.; Kong, Q.; Wang, C. Multi-label Classification of Electrocardiogram with Modified Residual Networks. In
Proceedings of the 2020 Computing in Cardiology, Rimini, Italy, 13–16 September 2020. [CrossRef]
20. Vicar, T.; Hejc, J.; Novotna, P.; Ronzhina, M.; Janousek, O. ECG Abnormalities Recognition Using Convolutional Network with
Global Skip Connections and Custom Loss Function. In Proceedings of the 2020 Computing in Cardiology, Rimini, Italy, 13–16
September 2020. [CrossRef]
21. Jia, W.; Xu, X.; Xu, X.; Sun, Y.; Liu, X. Automatic Detection and Classification of 12-lead ECGs Using a Deep Neural Network. In
Proceedings of the 2020 Computing in Cardiology, Rimini, Italy, 13–16 September 2020. [CrossRef]
22. Fayyazifar, N.; Ahderom, S.; Suter, D.; Maiorana, A.; Dwivedi, G. Impact of Neural Architecture Design on Cardiac Ab-
normality Classification Using 12-lead ECG Signals. In Proceedings of the 2020 Computing in Cardiology, Rimini, Italy,
13–16 September 2020. [CrossRef]
23. Chen, J.; Chen, T.; Xiao, B.; Bi, X.; Wang, Y.; Li, W.; Duan, H.; Zhang, J.; Ma, X. SE-ECGNet: Multi-scale SE-Net for Multi-lead ECG
Data. In Proceedings of the 2020 Computing in Cardiology, Rimini, Italy, 13–16 September 2020. [CrossRef]
24. Bos, M.; van de Leur, R.; Vranken, J.; Gupta, D.; van der Harst, P.; Doevendans, P.; van Es, R. Automated Comprehensive
Interpretation of 12-lead Electrocardiograms Using Pre-trained Exponentially Dilated Causal Convolutional Neural Networks. In
Proceedings of the 2020 Computing in Cardiology, Rimini, Italy, 13–16 September 2020. [CrossRef]
Algorithms 2024, 17, 364 27 of 28

25. Min, S.; Choi, H.-S.; Han, H.; Seo, M.; Kim, J.-K.; Park, J.; Jung, S.; Oh, I.-Y.; Lee, B.; Yoon, S. Bag of Tricks for Electro-
cardiogram Classification with Deep Neural Networks. In Proceedings of the 2020 Computing in Cardiology, Rimini, Italy,
13–16 September 2020. [CrossRef]
26. Hasani, H.; Bitarafan, A.; Soleymani, M. Classification of 12-lead ECG Signals with Adversarial Multi-Source Domain Generaliza-
tion. In Proceedings of the 2020 Computing in Cardiology, Rimini, Italy, 13–16 September 2020. [CrossRef]
27. Oppelt, M.; Riehl, M.; Kemeth, F.; Steffan, J. Combining Scatter Transform and Deep Neural Networks for Multilabel ECG Signal
Classification. In Proceedings of the 2020 Computing in Cardiology, Rimini, Italy, 13–16 September 2020. [CrossRef]
28. Zhu, Z.; Wang, H.; Zhao, T.; Guo, Y.; Xu, Z.; Liu, Z.; Liu, S.; Lan, X.; Sun, X.; Feng, M. Classification of Cardiac Abnormalities
from ECG Signals Using SE-ResNet. In Proceedings of the 2020 Computing in Cardiology, Rimini, Italy, 13–16 September 2020.
[CrossRef]
29. Zhao, Z.; Fang, H.; Relton, S.; Yan, R.; Liu, Y.; Li, Z.; Qin, J.; Wong, D. Adaptive lead weighted ResNet trained with different dura-
tion signals for classifying 12-lead ECGs. In Proceedings of the 2020 Computing in Cardiology, Rimini, Italy, 13–16 September 2020.
[CrossRef]
30. Natarajan, A.; Chang, Y.; Mariani, S.; Rahman, A.; Boverman, G.; Vij, S.; Rubin, J. A Wide and Deep Transformer Neural Network
for 12-Lead ECG Classification. In Proceedings of the 2020 Computing in Cardiology, Rimini, Italy, 13–16 September 2020.
[CrossRef]
31. Liu, F.; Liu, C.; Zhao, L.; Zhang, X.; Wu, X.; Xu, X.; Liu, Y.; Ma, C.; Wei, S.; He, Z.; et al. An Open Access Database for Evaluating
the Algorithms of Electrocardiogram Rhythm and Morphology Abnormality Detection. J. Med. Imaging Health Inform. 2018, 8,
1368–1373. [CrossRef]
32. Goldberger, A.L.; Amaral, L.A.N.; Glass, L.; Hausdorff, J.M.; Ivanov, P.C.; Mark, R.G.; Mietus, J.E.; Moody, G.B.; Peng, C.-K.;
Stanley, H.E. PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals.
Circulation 2000, 101, e215–e220. [CrossRef]
33. Alday, E.A.P.; Gu, A.; Shah, A.J.; Robichaux, C.; Wong, A.-K.I.; Liu, C.; Liu, F.; Rad, A.B.; Elola, A.; Seyedi, S.; et al. Classification
of 12-lead ECGs: The PhysioNet/Computing in Cardiology Challenge 2020. Physiol. Meas. 2020, 41, 124003. [CrossRef]
34. Rai, H.M.; Chatterjee, K. A Novel Adaptive Feature Extraction for Detection of Cardiac Arrhythmias Using Hybrid Technique
MRDWT & MPNN Classifier from ECG Big Data. Big Data Res. 2018, 12, 13–22. [CrossRef]
35. Jahmunah, V.; Ng, E.Y.K.; Tan, R.S.; Oh, S.L.; Acharya, U.R. Uncertainty quantification in DenseNet model using myocardial
infarction ECG signals. Comput. Methods Programs Biomed. 2023, 229, 107308. [CrossRef]
36. Barua, P.D.; Aydemir, E.; Dogan, S.; Kobat, M.A.; Demir, F.B.; Baygin, M.; Tuncer, T.; Oh, S.L.; Tan, R.-S.; Acharya, U.R. Multilevel
hybrid accurate handcrafted model for myocardial infarction classification using ECG signals. Int. J. Mach. Learn. Cybern. 2023,
14, 1651–1668. [CrossRef]
37. Al-Jibreen, A.; Al-Ahmadi, S.; Islam, S.; Artoli, A.M. Person identification with arrhythmic ECG signals using deep convolution
neural network. Sci. Rep. 2024, 14, 4431. [CrossRef] [PubMed]
38. Baumgartner, M.; Veeranki, S.P.K.; Hayn, D.; Schreier, G. Introduction and Comparison of Novel Decentral Learning Schemes
with Multiple Data Pools for Privacy-Preserving ECG Classification. J. Healthc. Inform. Res. 2023, 7, 291–312. [CrossRef]
39. Janbhasha, S.; Bhavanam, S.N.; Harshita, K. GAN-Based Data Imbalance Techniques for ECG Synthesis to Enhance Classification
Using Deep Learning Techniques and Evaluation. In Proceedings of the 2023 3rd International Conference on Advances in
Electrical, Computing, Communication and Sustainable Technologies (ICAECT), Bhilai, India„ 5–6 January 2023; Institute of
Electrical and Electronics Engineers Inc.: Piscataway, NJ, USA, 2023. [CrossRef]
40. Lecun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86,
2278–2324. [CrossRef]
41. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. In Advances in
Neural Information Processing Systems; Pereira, F., Burges, C.J., Bottou, L., Weinberger, K.Q., Eds.; Curran Associates Inc.: Red
Hook, NY, USA, 2012; Available online: https://proceedings.neurips.cc/paper_files/paper/2012/file/c399862d3b9d6b76c8436e9
24a68c45b-Paper.pdf (accessed on 27 July 2023).
42. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Commun. ACM 2017,
60, 84–90. [CrossRef]
43. Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2014, arXiv:1409.1556.
44. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on
Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [CrossRef]
45. Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A.; Liu, W.; et al. Going
deeper with convolutions. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR),
Boston, MA, USA, 7–12 June 2015; pp. 1–9. [CrossRef]
46. Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [CrossRef]
47. Maniatopoulos, A.; Mitianoudis, N. Learnable Leaky ReLU (LeLeLU): An Alternative Accuracy-Optimized Activation Function.
Information 2021, 12, 513. [CrossRef]
48. Dubey, A.K.; Jain, V. Comparative Study of Convolution Neural Network’s Relu and Leaky-Relu Activation Functions. In
Applications of Computing, Automation and Wireless Systems in Electrical Engineering: Proceedings of MARC 2018; Springer: Singapore,
2019; pp. 873–880. [CrossRef]
Algorithms 2024, 17, 364 28 of 28

49. Dubey, S.R.; Singh, S.K.; Chaudhuri, B.B. Activation functions in deep learning: A comprehensive survey and benchmark.
Neurocomputing 2022, 503, 92–108. [CrossRef]
50. Rai, H.M.; Yoo, J.; Dashkevych, S. Two-headed UNetEfficientNets for parallel execution of segmentation and classification of
brain tumors: Incorporating postprocessing techniques with connected component labelling. J. Cancer Res. Clin. Oncol. 2024,
150, 220. [CrossRef] [PubMed]
51. Ye, X.; Huang, Y.; Lu, Q. Automatic Multichannel Electrocardiogram Record Classification Using XGBoost Fusion Model. Front.
Physiol. 2022, 13, 840011. [CrossRef] [PubMed]
52. CinC 2020: Program. Available online: https://www.cinc.org/archives/2020/ (accessed on 29 July 2022).

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

You might also like