Deep Learning Based Channel Estimation Algorithm Over Time Selective Fading Channels
Deep Learning Based Channel Estimation Algorithm Over Time Selective Fading Channels
Abstract—The research about deep learning application for possible [3]. In Orthogonal Frequency Division Multiplexing
physical layer has been received much attention in recent years. [4] system, the deep learning algorithm for joint channel
In this paper, we propose a Deep Learning(DL) based channel estimation and signal detection has been researched in [5].
arXiv:1908.11013v1 [eess.SP] 29 Aug 2019
II. S YSTEM M ODEL where φ d = rfds is the maximum Doppler frequency normalized
In this section, signal architecture and time varying Rayleigh by sampling rate. Besides, It is generally asked that the channel
fading channel model are firstly presented. Then, a signal flow has normalized gain E(|h[n]| 2 ) = 1 in order to simplify
model will be introduced. Denote the transmitted signal and following analysis.
received signal as x,y, respectively. Denote the Rayleigh time
varying channel as h. Considering a Linear Time Variant(LTV) B. Signal Architecture
model, the relation between input and output of channel is: Considering standard signal architecture, transmitted signals
y = h· x+ω (1) are generated as shown in Fig. 1. One single data frame
consists of K blocks. Due to multi-path not considering in
where ω is i,i,d Additive White Gaussian Noise(AWGN) this article, protection interval isn’t necessary. Each block has
vector, and ωi ∼ CN (0, σn2 ) Ns information symbols and Np pilot symbols. Thus, each
block has Ns + Np = N symbols and the whole frame has total
A. Time Varying Rayleigh Fading Channel Model L = NK symbols. Pilots are equally interval inserted in each
Np
block, and define Ns +N as the pilot density. Besides, pilots
Typically, wireless communication environment is generally p
in each block are the same, which results in repetition in time
modeled as Rayleigh fading channel. Multi-path will cause domain.
frequency selective fading and Doppler shifting will result in
time selective fading. However, in this paper, only time selec-
tive fading is considered in order to give the first exploration C. Signal Flow Model
of rapidly varying channel. The influence of multi-path will The signal flow model is shown in Fig. 2. At the transmitter
be researched in the future work. side, no deep learning technology is introduced. Information
Clarke’s model [15] is used in this paper to describe bits and pilot bits are combined to generated original signal.
time varying channel. In order to describe the time varying After modulating, transmitted signal x is sent to the channel
characteristic, Jakes Doppler Spectrum [16] is adopted here: and modulated pilots p are sent to NN estimator. At the
receiver side, NN channel estimator uses p and channel
1
S( f ) = q , | f | < fd (2) distorted signal plus the noise y to give the estimation of
π fd 1 − ( ffd )2 channel h.
Two things need to be notified. Firstly, due to no NN
where fd is the maximum Doppler shift. Given a speed v(m/s) introduced at transmitter, it is easy to add any traditional
and carrier frequency fc (Hz), fd = vcfc (c ≈ 3.0 ∗ 108 is the channel coding such as Low Density Parity Check(LDPC)
speed of light in free space). The autocorrelation of Jakes [17], to improve the performance against noise. Secondly,
Doppler Spectrum is: NN channel estimator doesn’t need any information about the
∫ fd channel. It means that the communication system is model
R(τ) = S( f ) exp( j2π f τ)df = J0 (2π fd τ) (3) free.
− fd
where J0 (·) is the first kind of Bessel function of 0 order and D. Traditional Algorithms For Channel Estimation
the discrete form of autocorrelation is:
In channel estimation, the most common estimators are
R[d] = J0 (2πφ d |d|) (4) Least Square(LS) [18] estimator and Minimal Mean Square
3
Error(MMSE) [19] estimator. According to (1), LS estimator The basic RNN cell will give the computation result as the
under the time varying channel is: following function.
y ht = T anh(Wih xt + bih + Whh ht−1 + bhh )
ĥ LS = (5) (9)
x
where T anh is hyperbolic tangent function and ht , ht−1 are
For those positions where pilots are inserted, above equation
the hidden states at time t and t − 1, respectively. xt is the
can be directly used to get the estimation. For other positions,
input at time t. Wih, Whh and bih, bhh are weights and biases,
linear interpolation is necessary. Denote j,k( j < k) to be po-
which need to be learned.
sitions of pilot nearest to the position i. Thus, the interpolated
However, the time varying channel h(t) has relation with
channel is:
both past and future channel states. Basic RNN cell is fed
k −i i−j forward only. Thus, bidirectional structure, as shown in Fig.
ĥi, LS = ĥ j, LS + ĥk, LS (6)
k−j k−j 3b, would have better performance. Blue blocks are forward
Due to the existence of noise, omitting the influence of cells and red blocks are backward cells. The data will not
interpolation, the expected Mean Square Error(MSE) of LS only be fed in forward direction, but fed backward again. The
0
estimator is: hidden states ht and ht are combined together to become the
input of a linear layer to give final results.
ω2 1 Another problem is that Basic RNN cell with (9) can’t
E(| ĥ LS − h| 2 ) = E( 2
)= (7)
x SN R capture long time information. To solve this problem, Long
Another traditional estimator would be MMSE estimator: Short Time Memory(LSTM) [20] cell has been put forward.
In this paper, Gated Recurrent Unit(GRU) [21] is used, one
σn2 −1
ĥ M M SE = Rhy R−1
yy y = R hh (R hh + I) ĥ LS (8) variation of LSTM, to replace basic RNN cell. The GRU will
σs2 give the result as the following function( [21],(5),(6),(7),(8))
where I represents unit matrix and Rhh = E(hh H ) represents z t = σ(Wz · [ht−1, xt ]) (10a)
correlation matrix:
rt = σ(Wr · [ht−1, xt ]) (10b)
R[0] R[1] R[2] ··· R[L − 1]
h t = T anh(W · [rt ∗ ht−1, xt ])
R[1]
R[0] R[1] ··· R[L − 2]
(10c)
Rhh =
R[2] R[1] R[0] ··· R[L − 3]
.. .. .. .. ..
ht = (1 − z t ) ∗ ht−1 + z t ∗ h t (10d)
. . . . .
where σ(·) refers to Sigmoid function fs (x) = 1+e1−x ,
R[L − 1] R[L − 2] R[L − 3] ··· R[0]
Wz , Wr , W are weights and ht , ht−1, xt have the same meaning
where R[·] can be calculated according to (4) as (9). Compared with basic RNN cell, GRU introduces 2
It should be noticed that the form of autocorrelation function gates, update gate z t and reset gate rt , to control the informa-
of channel and Doppler speed need to be given in advance tion flow. GRU has been proved to have similar performance
in order to undertake the MMSE estimation. However, real to LSTM on many tasks [22] and have higher speed due to
channel model and accurate statistic characteristic(Doppler less gate number.
speed here) are hard to know under practical application. Thus, Based on above discussion, BGRU cell will be used in NN
two methods for MMSE estimation are used in simulation. channel estimator. However, the result of simple BGRU is
Firstly, assuming above information already known, not good enough. The idea of Sliding BRNN(SBRNN) [11]
ĥ M M SE can be directly calculated according to (3) and (8). is considered to improve the performance further, and the
Thus, we call this method "MMSE theory". Secondly, after compare between BGRU and SBGRU will be given in section
getting LS estimation, ĥ LS can be used to calculate auto- IV.C.
correlation R[d] = n=0
Í L−1
ĥ LS [n] ĥ LS [n − d] and then use (8).
We call this method "MMSE sim" because the computation is
B. SBGRU structure
completed by simulation results.
SBRNN is put forward in [11] to work as a detector under
optical and molecule channel. Here, this structure is used
III. DL- BASED NN CHANNEL ESTIMATOR in estimation task under the time varying Rayleigh fading
To track a time varying channel, it is necessary to give neu- channel. A simple example of the sliding structure is given
ral network the ability of studying the behavior of correlation in Fig. 4. Each BGRU block in the figure has a fixed window
in time domain. Thus, a good choice to handle sequence data length W L . It should be stated that the selection of window
is using RNN. length has relationship with channel character. Due to the any
two moments of channel h is correlated, it is reasonable that
the longer the window is, the better the performance will be.
A. RNN structure The simulation about window length will be given in section
A simple example of 1 layer RNN is given in Fig. 3a. In IV.D.
this structure, the output of last time becomes one part of input SBGRU will be given W L symbols to undertake once com-
of this time. By this way, RNN can capture past information. putation, and will slide 1 symbol after each computation. Due
4
(a) (b)
Fig. 3. The structure of RNN (a) The structure of forward only RNN (b) The structure of bidirectional RNN
TABLE I
C HANNEL AND DATA PARAMETERS
TABLE II
NN PARAMETERS FOR SIMULATION
Parameter Estimator
NN architecture SBGRU
Number of hidden layers 2
Hidden size 40*2(2 for Bi-direction)
Window Length 40 symbols
Activation function Tanh for hidden layers & Relu for hidden layers
Loss function MSE
Optimizer Adam
Learning rate 0.001
Batch Size 128
Train SNR 20dB
Test SNR 5,10,15,20,25dB
Train number 100000
Validation number 10000
Test number 10000
(a) (b)
Fig. 7. Simulation results for Channel Tracking. (a) Tracking performance of SBGRU estimator (b) Tracking performance of LS and MMSE estimator
Besides, the channel estimation problem under similar time block length increases to 32, the performance increase a
varying channel has been researched in [13] by using MLP bit. However, a estimation block length of 40 will result in
neural network. Its basic idea is to include not only the channel performance decreased. It is because MLP with estimation
distorted data and pilot data but the estimated channel from last block length 40, which is not the integral multiple of original
block to get the better channel estimation performance. In its data block length 16, can’t fully explore the pilot information
simulations, it sets the estimation block length the same as the repeated in time domain. However, SBGRU estimator outper-
data structure. However, this estimation block length can be forms all above MLP estimator when SNR is above 5dB.
different. In order to compare the performance fairly, the NN Besides, thanks to the recurrent structure of RNN, previous
architecture in [13] is reconstructed, trained and tested using channel estimation doesn’t need to be inputed into neural
the same settings and simulation parameters as the SBGRU network. It can be captured by SBGRU automatically.
simulation. Besides, three different parameters 16, 32 and 40
are used to fully explore the influence of the estimation block D. Performance vs window length
length, .
Here the influence of sliding window length is explored.
The performance comparison between MLP and SBGRU is The performance among different window lengths is in Fig.
given in Fig. 9. MLP with estimation block length 16(same 10. The performance monotonically increases as the window
design as [13]) doesn’t work very well. It is possible that length getting longer. Except for window length of 16 symbols,
parameters in NN model is not enough so that the ability all 3 other window lengths have nearly the same performance.
to learn the nonlinear channel isn’t strong. When estimation It shows that the window length can’t be too short in or-
7
Fig. 8. Performance compare between Sliding BGRU and Non-sliding BGRU Fig. 10. The influence of sliding window length to SBGRU estimator
Fig. 9. Performance compare between SBGRU estimator and MLP estimator Fig. 11. The influence of pilot density to SBGRU estimator
der to have enough information to undertake the estimation. DL-based channel estimator can achieve better performance
However, the too long window length can’t bring much more than traditional algorithms and some NN estimators with dif-
improvement. Thus, selecting a suitable window length can ferent structures. Besides, the proposed NN channel estimator
achieve the balance between the accuracy and the speed of shows its ability to dynamically track the channel and its
training and testing. Overall, the setting of window length have robustness with pilot density.
the relation with channel characteristics. In the traditional communication, there are much more
complex traditional algorithms to complete channel estimation.
However, there are some unique advantages compared with the
E. Performance vs pilot density
traditional algorithms when deep learning algorithms are used.
Finally, the influence of pilot density is described to show
• Despite many estimation methods having been developed
the robustness of SBGRU estimator. The performance is
in traditional communication system, most of them al-
shown in Fig. 11. As the pilot density decreases, the MSE
ways assume the channel to be invariant in coherence
performance indeed decreases a little but not seriously. The
time. However, using deep learning algorithm, the prior
result is still much better than LS estimation and "MMSE sim"
knowledge about channel model and the channel invariant
estimation. Thus, SBGRU estimator shows the performance
in coherence time assumption aren’t needed during the
robustness with the different pilot densities.
training and testing, which shows the potential perfor-
mance of DL-based algorithm under the time varying
V. C ONCLUSION channel.
In this paper, a DL-based channel estimator is designed • The channel estimator designed in this paper can be easily
under the time varying Rayleigh fading channel. The proposed optimized by combining traditional algorithms. For exam-
8
ple, it’s convenient to insert the high performance channel [19] V. Charles Drastik, “Minimum mean squared error estimation,” Bulletin
coding before the modulation to protect the performance of The Australian Mathematical Society - BULL AUST MATH SOC,
vol. 30, 10 1984.
against Gaussian noise. Thus, the MSE performance can [20] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural
be further improved. computation, vol. 9, pp. 1735–80, 12 1997.
[21] K. Cho, B. van Merriënboer, C. Gulcehre, F. Bougares, H. Schwenk, and
In addition, there is still a lot work to do in applying Y. Bengio, “Learning phrase representations using rnn encoder-decoder
deep learning or machine learning technology to the physical for statistical machine translation,” 06 2014.
layer under time varying channel and here are some following [22] K. Greff, R. K. Srivastava, J. Koutník, B. R. Steunebrink, and J. Schmid-
huber, “Lstm: A search space odyssey,” IEEE Transactions on Neural
aspects. Networks and Learning Systems, vol. 28, no. 10, pp. 2222–2232, Oct
• Except for the channel estimation, it is also feasible to 2017.
[23] D. Kingma and J. Ba, “Adam: A method for stochastic optimization,”
construct a detector to undertake the equalization and International Conference on Learning Representations, 12 2014.
demodulation together using deep learning algorithm.
Thus, by connecting NN estimator and NN detector,
a wireless communication system can be constructed.
It is worth to explore whether such DL-based system
can achieve better bit error rate(BER) performance than
traditional system under the time varying channel and is
still robust with different pilot densities.
R EFERENCES
[1] T. O’Shea and J. Hoydis, “An introduction to deep learning for the
physical layer,” IEEE Transactions on Cognitive Communications and
Networking, vol. 3, no. 4, pp. 563–575, Dec 2017.
[2] B. Zhu, J. Wang, L. He, and J. Song, “Joint transceiver optimization for
wireless communication phy using neural network,” IEEE Journal on
Selected Areas in Communications, vol. 37, no. 6, pp. 1364–1373, June
2019.
[3] S. DÃűrner, S. Cammerer, J. Hoydis, and S. t. Brink, “Deep learning
based communication over the air,” IEEE Journal of Selected Topics in
Signal Processing, vol. 12, no. 1, pp. 132–143, Feb 2018.
[4] B. Le Floch, M. Alard, and C. Berrou, “Coded orthogonal frequency
division multiplex [tv broadcasting],” Proceedings of the IEEE, vol. 83,
no. 6, pp. 982–996, June 1995.
[5] H. Ye, G. Y. Li, and B. Juang, “Power of deep learning for channel
estimation and signal detection in ofdm systems,” IEEE Wireless Com-
munications Letters, vol. 7, no. 1, pp. 114–117, Feb 2018.
[6] B. Karanov, M. Chagnon, F. Thouin, T. A. Eriksson, H. BÃijlow,
D. Lavery, P. Bayvel, and L. Schmalen, “End-to-end deep learning
of optical fiber communications,” Journal of Lightwave Technology,
vol. 36, no. 20, pp. 4843–4855, Oct 2018.
[7] M. Mirza and S. Osindero, “Conditional generative adversarial nets,” 11
2014.
[8] H. Ye, G. Y. Li, B. F. Juang, and K. Sivanesan, “Channel agnostic end-
to-end learning based communication systems with conditional gan,” in
2018 IEEE Globecom Workshops (GC Wkshps), Dec 2018, pp. 1–5.
[9] J. C. Spall, “An overview of the simultaneous perturbation method for
efficient optimization,” 02 2001.
[10] V. Raj and S. Kalyani, “Backpropagating through the air: Deep learn-
ing at physical layer without channel models,” IEEE Communications
Letters, vol. 22, no. 11, pp. 2278–2281, Nov 2018.
[11] N. Farsad and A. Goldsmith, “Neural network detection of data se-
quences in communication systems,” IEEE Transactions on Signal
Processing, vol. 66, no. 21, pp. 5663–5678, Nov 2018.
[12] S. Ganesh, V. Sayee Sunder, and A. Thakre, “Performance improvement
in rayleigh faded channel using deep learning,” in 2018 International
Conference on Advances in Computing, Communications and Informat-
ics (ICACCI), Sep. 2018, pp. 1307–1312.
[13] X. Ma, H. Ye, and Y. Li, “Learning assisted estimation for time-
varying channels,” in 2018 15th International Symposium on Wireless
Communication Systems (ISWCS), Aug 2018, pp. 1–5.
[14] Y. Yang, F. Gao, X. Ma, and S. Zhang, “Deep learning-based channel
estimation for doubly selective fading channels,” IEEE Access, vol. 7,
pp. 36 579–36 589, 2019.
[15] R. H. Clarke, “A statistical theory of mobile-radio reception,” The Bell
System Technical Journal, vol. 47, no. 6, pp. 957–1000, July 1968.
[16] M. J. Gans, “A power-spectral theory of propagation in the mobile-radio
environment,” IEEE Transactions on Vehicular Technology, vol. 21,
no. 1, pp. 27–38, Feb 1972.
[17] M. Livshitz, “Low density parity check (ldpc) code,” Patent, 11, 2013.
[18] S. A. van de Geer, Least Squares Estimation, 10 2005, vol. 2.