Retrieve
Retrieve
2 (2023) 185
HUANG Yongfeng( 黄永锋) 1∗ , HUANG Qihong( 黄琦洪) 1 , SUN Chenxi( 孙晨汐) 1 , YANG Shuchen( 杨树
臣) 2 , ZHANG Zhiming( 张智明) 2
1 School of Computer Science and Technology, Donghua University, Shanghai 201620, China
2 Shanghai Yueyang Medtech Co. , Ltd. , Shanghai 201203, China
Abstract: Sleep apnea is a common health condition that can increasing social pressure and various psychological
affect numerous aspects of life and may cause a lot of health diseases. Some researchers believe that anxiety and
problems especially in the middle-aged and elderly depression are gradually becoming the main or even all
population. Polysomnography ( PSG ) , as the gold inducement of sleep apnea [2] . Due to the universality of
standard, is an expensive and inconvenient way to diagnose
above-mentioned problems, sleep apnea has become a
sleep apnea. However, ballistocardiogram can be collected
nonnegligible disease in modern society.
by devices embedded in the surrounding environment,
enabling inperceptible sleep apnea detection. Moreover, to Since sleep apnea requires monitoring of sleep
obtain the fine-grained apnea fragments, a multistage sleep conditions throughout the night, polysomnography
apnea detection model has been proposed. This model firstly ( PSG) has been proposed in clinical medicine for the
uses an improved convolution neural network ( CNN) model diagnosis of apnea. PSG, performed in sleep laboratory,
to coarsely identify apnea events and then a U-Net based is considered the gold standard for sleep apnea diagnosis
model is applied to finely segment apnea fragments. In the by recording various physiological signals, including
experiment, sleep data of 11 patients with apnea for about nasal inspiratory airflow, electroencephalogram ( EEG )
70 h have been collected, including BCG data derived from and electrocardiogram ( ECG ) , allowing researchers to
18 piezoelectric polyvinylidene fluoride ( PVDF ) sensors achieve accurate results. However, it is uncomfortable
embedded in the mattress and PSG data collected
and may cause psychological burden of patients.
synchronously. The results show the accuracy of the
Furthermore, for a severe apnea patient, monitoring at
classification model as good as 95. 7% with 0. 818 dice
coefficient of the segmentation model, which indicates that home and inspection in the hospital should be given equal
the proposed model can almost match the performance of attention. Ballistocardiogram ( BCG ) , as a universal
PSG in detecting apnea. medical technique, records the change of pressure caused
Key words: sleep apnea; ballistocardiogram; convolution by cardiovascular and respiration in a cost-effective and
neural network ( CNN) ; deep learning noninvasive way [3-4] . Since the signal can be measured by
CLC number: TP399 Document code: A sensors embedded in the ambient environments without
Article ID: 1672-5220(2023)02-0185-08 the need for medical staff, it can be a potential method
for home sleep apnea monitoring.
Open Science Identity However, most of the past studies [5-9] on sleep apnea
( OSID) have focused only on classification tasks using traditional
methods such as template matching, heart rate detection
Introduction paired with machine learning and deep learning methods.
Specifically, the classification results can only identify
Sleep apnea, as an extremely common sleep disorder whether or what kind of sleep apnea occurrs in a period.
in modern society, may lead to shortness of breath and In fact, more detailed information such as the frequency
affect our sleep quality. Studies have shown that repeated of sleep apnea and the approximate period of sleep apnea
episodes of apnea will lead to nocturnal hypoxia, which is of equally significance. Hence, past studies fail to
in turn can cause many serious complications such as provide meaningful information for clinical diagnosis in
hypertension, diabetes, cardiovascular and sleep apnea, for example, apnea-hypopnea index
cerebrovascular diseases, and even sudden death at ( AHI) . So, we propose a fine-grained multistage sleep
night [1] . Moreover, about 10% of population has apnea detection model. This model first identifies apnea
suffered a lot from sleep apnea, mainly owing to the events using convolutional neural network ( CNN) , and
then uses a U-Net based model to segment the time period roughly equivalent to the head and chest of the human
during which the apnea occurs. body.
Furthermore, a high accuracy with single channel In this study, we collected BCG data from 11 apnea
BCG signal was achieved in Ref. [ 10 ] . However, a patients. Data acquisition time was approximately 7 h of
single sensor system may cause measurement problems sleep throughout the night for each subject. The BCG
such as low coverage and non-continuity since BCG sampling frequency is 50 Hz. To improve the
can only be detected when the human body produces generalization ability of the model, we also added a
pressure on the mechanical sensor. To tackle this normal human BCG data collected in the same way.
problem, we propose that, due to the noninvasive For comparison, the PSG data of the above 11
characteristic, multiple-sensor systems can be used to patients were also recorded and the segments where apnea
collect BCG signal at the same time and make full use occurred were labeled by a professional sleep physician.
of BCG signal through channel fusion to improve the Professional sleep specialists determined if apnea occurred
accuracy of detection. based on the nasal airflow and thoracoabdominal
The main contributions are the following two points. breathing conditions collected by the PSG. Central apnea
Firstly, we propose a more fine-grained multistage sleep is characterized by the absence of nasal airflow and
apnea detection model that can reach the sample point thoracoabdominal breathing, while obstructive apnea is
level of resolution, which is at a higher resolution characterized by undetectable nasal airflow and normal
compared with state-of-the-art methods. Secondly, thoracoabdominal breathing. Sleep physicians use the
different multichannel data fusion methods are compared, above methods to determine whether apnea occurs and
and squeeze-and-excitation net ( SENet) shows the best how long it has lasted.
results under the contrast experiment of different Then the Butterworth digital filter is applied to
parameters. remove high-frequency noise and the cut-off frequency of
3 dB filter is 10 Hz and the order is 7. The Z-score
1 Methods normalization method is used to normalize the 18 channel
signals to accelerate the convergence speed of the model.
Figure 1 shows the global architecture of the Moreover, we use the sliding window method to organize
proposed U-Breath model. The model will settle the the data. Specifically, Fig. 2 shows the distribution of the
multichannel BCG signal fusion problem firstly, and then duration of apnea patients in all samples. It can be seen
determine whether sleep apnea events happens or not. If that the duration of most of the samples is concentrated in
it happens, the fine-grained segmentation is further the time range from 10 to 60, and the duration of the
performed and then the marked apnea segment will be longest apnea sample is around 120. Therefore, the
output. length of each window is 90, and to expand the sample
data set, we slide the window in 10-second steps. Within
the 90 second window, the data that contain apnea
fragments longer than 10 are regarded as positive
samples, while the rest are negative samples.
Fig. 3, in which the sign 18 × 4 500 means that the size of parameter. SENet consists of three parts: squeeze,
the input signal consists of 18 channels and 4 500 one- excitation and scale. Its main purpose is to make the
dimensional sample points; the signs 18 × 1, c@ 1 × 1 and model pay more attention to the channel features with the
1 × 4 500 represent the output signal size after the current largest amount of information and suppress those
layer processing of the model, where c is a variable unimportant channel features.
input. Alivar et al. [14] found that the channel with better
2 Experiments signal quality should be normal distribution. Therefore,
we choose the data that best fit the normal distribution as
2. 1 Classification the input of the classification network in our experiments.
To verify the effectiveness of the classification Figure 6 shows the data distribution of the 18 sensors for
network, we manually select one channel signal as the one subject we randomly selected.
For validating the performance of the proposed The total number of samples is 23 573, including
model, we trained some different models for comparison. 10 541 positive samples and 13 032 negative samples.
The first one is the U-Breath model proposed in this We extracted the training set, the validation set and the
paper. Next, we reproduced the model in recent test set from the BCG data of 11 subjects in the
study [15] , where they proposed a model for apnea event proportion of 60%, 20% and 20%, respectively, and
detection based on an SVM implementation. Then, a combined them into the final training set, the final
bidirectional long short-term memory ( Bi-LSTM ) validation set and the final test set. The number of
based [16] model and a TCN based [17] model in the samples of classification network is shown in Table 1.
study [18] were also used for comparison. Note that all the
above-mentioned models use the same data input format. Table 1 Sample number of each dataset in classification network
In the U-Breath model experiment, we referred to its Type Train Validation Test Total
original study [13] to set our experimental parameters. Positive 6 325 2 108 2 108 10 541
Specifically, the dropout layer probability is 0. 2. That Negative 7 820 2 606 2 606 13 032
is, 20% of the neuron output value is set to 0
Total 14 145 4 714 4 714 23 573
randomly [19] . Then we use a mini-batch method for
training [20] , and gradient descent method with an Adam
After training 200 epochs, the U-Breath model
optimizer is performed on only 256 samples at a time.
converge. Figure 7( a) shows the train loss and validation
The learning rate is 0. 001, and the total number of
loss of the U-Breath model. Meanwhile, the training of
training epochs is 200. And we use cross entropy as the
the comparison model was also completed according to
loss function, which can measure the difference between
the respective parameter settings of the comparison
two data distributions. The calculation formula of cross
model. Table 2 shows the evaluation metrics of all
entropy loss l c is shown as
classification models, where the experimental results of
l c = - [ ylg y^ + (1 - y) lg(1 - y^ ) ] , (1) the SVM model are averaged after 5 k-fold cross
validation. Figure 8 ( a ) shows the receiver operating
where y represents the real value and y^ represents the characteristic ( ROC ) curve of all models and the
predicted value. For the other comparison models, we respective area under curve A ROC , both of which indicates
referred to the parameter settings in their respective that the U-Breath model outperforms other models and
articles to train the models. achieves elegant results.
Fig. 7 U-Breath model train loss and validation loss: ( a) classification result; ( b) segmentation result
Fig. 8 Evaluation metrics for different models: (a) ROC curves for different classification models; (b) precision-recall (PR) curve for U-Breath model
190 HUANG Yongfeng, HUANG Qihong, SUN Chenxi, et al.
Table 2 Evaluation metrics for different models classification network, we keep the original parameters of
Model Accuracy Precision Recall F-score the classification network unchanged and only feed the
single channel signal into the classification network after
SVM 0. 636 0. 721 0. 656 0. 687 the channel selection or fusion process for validation. For
SENet, in the experiments, we also verify the
Bi-LSTM 0. 670 0. 657 0. 635 0. 646
performance of the network with different squeeze layers
TCN 0. 653 0. 581 0. 797 0. 687 c ( SE-c ) . According to Ref. [ 10 ] , we verify the
network performance when c takes five parameters of 1,
U-Breath 0. 957 0. 958 0. 958 0. 958
2, 4, 8 and 16.
Tables 3 and 4 show the experimental results of the
2. 2 Segmentation U-Breath model after adding different channel fusion
In the segmentation model, we refer to U-Net to set layers. Here, for the convenience of comparison, the
our experimental parameters and the mini-batch method is experimental results of the manual channel selection
also used to gradient down 64 samples each time. Then approach are also added in Tables 3 and 4. Similarly, we
we use the Adam optimizer to update the gradients. The also combine the channel fusion layer and segmentation
learning rate is 0. 001, and the total number of training network for the experiment, and the experimental results
epochs is 200. In addition, dice coefficient, a widely are shown in Tables 5 and 6.
used loss function in image segmentation, can evaluate
the similarity between the prediction set and the real set. Table 3 Evaluation metrics for the combination of
The calculation formula of the dice coefficient and the classification networks and different channel fusion
dice doss are given by methods