Wireless Channel Estimation with GANs
Wireless Channel Estimation with GANs
Abstract
This paper presents a novel compressed sensing (CS) approach to high dimensional wireless channel
estimation by optimizing the input to a deep generative network. Channel estimation using generative networks
relies on the assumption that the reconstructed channel lies in the range of a generative model. Channel
reconstruction using generative priors outperforms conventional CS techniques and requires fewer pilots. It also
eliminates the need of a priori knowledge of the sparsifying basis, instead using the structure captured by the
deep generative model as a prior. Using this prior, we also perform channel estimation from one-bit quantized
pilot measurements, and propose a novel optimization objective function that attempts to maximize the
correlation between the received signal and the generator’s channel estimate while minimizing the rank of the
channel estimate. Our approach significantly outperforms sparse signal recovery methods such as Orthogonal
Matching Pursuit (OMP) and Approximate Message Passing (AMP) algorithms such as EM-GM-AMP for
narrowband mmWave channel reconstruction, and its execution time is not noticeably affected by the increase
Index Terms
The authors are with the University of Texas at Austin, TX, USA. Contact Author Email: [email protected]. This work was
supported in part by Intel. This paper was presented in part at the 21st IEEE Signal Processing Advances in Wireless Communications
Workshop, May 2020 in the Special Session for Machine Learning in Communications [1].
2
MIMO channel estimation, Generative Adversarial Networks (GAN), compressed sensing, one-bit receivers
I. I NTRODUCTION
A. Motivation
To meet the demand for extremely high bit rates and much lower energy consumption per bit, future
wireless systems are trending to bandwidths larger than 1 GHz and carrier frequencies above 100 GHz.
As an example, a future communication system (6G and beyond) may operate at a carrier frequency
approaching 300 GHz with well over 10,000 cross-polarized antenna elements at each transceiver,
and antenna spacings on the order of 1-2 mm [2], [3]. For channel estimation in many antenna
systems, typically the number of pilots is assumed to be larger than the number of transmit antennas,
resulting in significant training overhead which does not scale well to such high dimensional future
communication systems. Using sparsity with a compressed sensing method to alleviate this problem
leads to solving a complex optimization problem at every coherence interval, whose complexity
scales with the number of antennas, will become infeasible. Thus, existing approaches to channel
estimation will not scale to this regime in terms of complexity, power consumption, or pilot overhead,
and fundamentally new methods are needed. The key to simplifying channel estimation in such high
dimensional systems is to exploit stronger prior knowledge of the channel structure. In this paper we
propose a novel unsupervised learning-based approach using deep generative networks for channel
estimation.
B. Related Work
Traditional training-based channel estimators such as least-squares (LS) are optimum Maximum
Likelihood estimators for rich multipath channels. Furthermore, for Gaussian signal recovery with a
known correlation matrix, minimum mean-squared estimators (MMSE) find the signal estimate x
3
that maximizes the a posteriori probability p(x|y) and outperforms LS [4]. However recent channel
measurements conducted for mmWave and THz cellular systems have indicated that, due to clustering
of the paths into small, relatively narrowbeam clusters, high dimensional channels are often very
sparse in their beamspace representation [2] or their spatial covariance matrix is low rank [5]. Among
the first papers to highlight the need for exploiting these sparse structures, that LS and MMSE cannot
exploit, was [6], which exploited channel sparsity in the beamspace representation of a multi-antenna
channel to formulate channel estimation as a CS problem, while [7] also highlighted how to exploit
sparsity in the delay-Doppler domain. MmWave channel estimation is made difficult by the low
received SNR due to high omnidirectional path loss, and to combat this path loss, large antenna arrays
are used to obtain beamforming gain. In [8], a sparse formulation of the mmWave channel estimation
problem was given by expressing the sensing matrix Ψ as a function of the transmit and receive
antenna array response vectors, in addition to the training precoders and combiners. An open loop
strategy for downlink mmWave channel estimation and design of precoders/combiners that minimize
the coherence of Ψ, while incorporating hybrid constraints at the transceiver, was presented in [9],
enabling reconstruction from a small number of measurements. In [8] and [9], Orthogonal Matching
Pursuit (L0 norm minimization) and Basis Pursuit Denoising (L1 norm minimization) were employed
for sparse channel reconstruction. Approximate Message Passing (AMP) is another robust class of
techniques for compressed sensing [10], and variants such as EM-GM-AMP [11] and VAMP [12]
outperform OMP and BPDN for a large class of sensing matrices. AMP has been widely advocated
for MIMO channel estimation in the research community, especially for low resolution receivers
[13], [14]. AMP has even been extended to adaptively learn the clustered structure in the angle-delay
domain in [15].
However, real world channels are never exactly sparse in the DFT-basis, nor do we know the basis
that would yield the most sparse representation. Moreover, all these techniques involve solving a
complex optimization problem at each interval, and require a large number of pilots, especially in low
4
resolution receivers. These are some of the reasons why CS-based methods are still not employed in
conventional WiFi receivers for channel estimation, which typically employ LS channel estimation
with frequency domain smoothing that leverages the coherence bandwidth of the channel [16].
Meanwhile, there has been a rapid advancement in the application of techniques from deep learning
to channel estimation for massive MIMO and mmWave systems. One of the approaches taken was
to perform Joint Channel Estimation and Signal Detection (JCESD) [17] [18], thus performing
channel estimation implicitly. Not recovering the channel estimate prevents precoder and/or combiner
optimization, and these techniques call for extensive signal processing changes at the transceiver.
One obvious way to recover the channel estimate is to train a Neural Network (NN) in a supervised
manner, such that it is trained to take as input the pilot measurements and output the channel matrix.
This approach is taken in many recent papers [19]–[23]. In particular, [19] also appends the LS
channel estimate of the current and previous received pilot signal to the NN’s input to improve its
performance. In [20], a variant of the AMP technique called LDAMP is unfolded into a NN, by
making the parameters of LDAMP learnable. Exploiting the inherent structure in a spatial channel
matrix, making its estimation analogous to image reconstruction, [21] and [22] employ Convolutional
Neural Networks (CNNs) in place of Fully Connected NNs to learn a channel estimator. A novel
refinement called SIP-DNN was proposed in [23], that chose to estimate the channel at all the receive
antennas using only the signal received by the high-resolution ADC antennas.
However building such labeled channel datasets for a supervised task is time-consuming, and
most of these techniques would not perform well if the received signal was corrupted by hardware
impairments and/or transient effects such as shadow fading. A few papers recently have been using
techniques from unsupervised learning to overcome this limitation of having to build a huge labeled
dataset. In [24] and [25], they combine an LS estimator with an underparameterized CNN-based
denoiser called Deep Decoder [26] to exploit correlation in the channel estimate to improve its quality.
In [27], they train an autoencoder to learn a compressed representation for the channel that could
5
immensely reduce channel state information (CSI) feedback overhead in massive MIMO, while [28]
C. Contributions
In summary, most of the proposed Deep Learning techniques are discriminative, meaning that a
priori information is not exploited as opposed to generative models and often call for drastic changes
in transceiver signal processing, while the existing signal processing techniques are designed for
sparse signal recovery. As mentioned before, there is no explicit way to determine the basis that will
generate the sparse channel representation with the least non-zero entries, which would allow perfect
recovery for a wider range of sensing matrices with the same number of measurements. This is where
compressed sensing using generative models proves useful. By finding an approximate solution in the
span of a generative model, [29] shows how to achieve CS guarantees without employing sparsity.
The authors of [29] present a simple gradient descent based algorithm that enables signal recovery for
inherently sparse or structured signals from compressed measurements by exploiting the prior learnt
by a generative model. In this paper, we draw inspiration from the approach presented in [29] to
perform the estimation of high dimensional wireless channels from compressive pilot measurements.
Training a GAN to learn the channel distribution: The underlying probability distribution of
spatial channel matrices for a particular environment can be very complex, and analytically intractable.
We describe how to train a Wasserstein GAN [30] using a set of simulated channel realizations, such
that it learns a generator model that is capable of drawing samples from the underlying channel
distribution.
Full resolution channel estimation: The trained generative model will output channel realizations
for different input vectors. We describe a procedure to find the optimal input vector such that we can
use the prior of the trained generator to find the channel estimate from a low number of noisy pilot
6
measurements. Moreover, the optimization problem defined by the generative network operates in a
and achieves significant reduction in computational complexity. Simultaneously, our technique also
helps to develop a channel representation that drastically reduces CSI feedback overhead, and learn a
One-bit quantized channel estimation: We design a custom loss function that aims to find
the channel estimate, in the range of the generator’s output, that has low rank while maximizing
correlation with the received one-bit measurements. We compare its performance with state-of-the-art
CS techniques such as EM-GM-AMP [11], and find that it significantly improves the quality of the
channel estimate, while still requiring only a limited number of pilots. We validate the improvement
in the channel estimate by evaluating the throughput for a hybrid precoded data transmission, where
The paper is organized as follows. The system model is outlined in Section II. The generative channel
estimator is explained in detail in Section III, the NN architecture details, simulation benchmarks and
results are outlined in Section IV and V respectively, and the conclusions are highlighted in Section
VI.
Consider a point-to-point downlink (DL) MIMO setup, where the base station is equipped with Nt
transmit antennas and the User Equipment (UE) is equipped with Nr receive antennas. For simplicity,
the exposition that follows considers only a single narrowband frequency channel but can easily be
extended to multiple (Nf > 1) subcarriers. We consider hybrid beamformers and combiners, and
In the DL channel estimation phase, the BS uses a training beamformer p ∈ CNt ×1 to transmit a
symbol s ∈ C. To simplify analysis, we set s = 1 in all experiments, but retain it in the equations for
ease of understanding. The UE employs Nr RF chains, hence for each beamforming vector p, Nr
measurements are produced at the UE. We assume that the training combiner qi ∈ CNr ×1 i ∈ [Nr ]
is a 1-sparse vector with 1 at the ith position. As explained in [9], the number of measurements
per time instant at the UE does not depend on the number of RF chains employed at the BS. The
employed by the BS during training. We denote this sequence as P = [p1 ...pNp ] ∈ CNt ×Np . It is
assumed that the channel coherence time is greater than Np T , where T is the symbol period, hence
the spatial channel matrix H ∈ CNr ×Nt remains constant over the Np time slots. Hence the received
Y = HPs + N, (1)
where each element of N ∈ CNr ×Np are independent and identically distributed complex Gaussian
random variables with mean 0 and variance σ 2 . To have more compact expressions, the matrices are
y = HPs + n, (2)
where y, HP, n ∈ CNr Np ×1 . Writing HP as INr HP, and utilizing the expansion ABC = (CT ⊗A)B,
we have
where T denotes the transpose operator and ⊗ denotes the Kronecker product. Clearly, the system of
equations represented by (3) does not have a unique solution if Np < Nt . In other words, the LS
has multiple solutions. Thus, in the low density pilot regime, one cannot directly use LS channel
algorithms, and is explained as part of the baselines for comparison in Section IV-E. The above
notation also extends easily to the case where the received signal is quantized, with (3) being rewritten
as
In Section II-A, we presented a training-based channel estimation approach, hence the training
beamformers used were random sequences of QPSK symbols (one can also use a random subset of
the columns of the DFT matrix or the Zadoff-Chu sequences). Now we move from the training stage
to the data transmission phase, where to obtain a higher throughput, one performs optimization of
the precoder matrices FRF and FBB at the BS, in which P = FRF FBB . To achieve this, the channel
estimate recovered at the UE is conveyed to the BS, to maximize the information-theoretic capacity,
while incorporating the hardware and power constraints imposed on the entries of FRF and FBB . As
outlined in [31], we utilize spatially sparse precoding via Orthogonal Matching Pursuit to find the
We use Generative Adversarial Networks (GANs) for training a generative model. Despite the
extensive recent application of deep learning to wireless communications, few communication papers
have employed GANs, owing to their perceived training instability [32]. In [33], the authors proposed
the use of variational GANs to accurately learn the channel distribution. However, they restricted
9
themselves to additive noise, and did not consider fading or MIMO. In [34], they employ a conditional
GAN that is trained to output the received signal when the transmitted signal and the received pilot
information is appended to the input of the GAN. However, when extending it to fading channels,
they assumed that the real channel response was available as input to the GAN. Moreover, none of
these papers exploit the compressed representation that the generator of a GAN learns for a given
output signal. We now give an overview of the training procedure for GAN’s in the context of spatial
A GAN [35] consists of two feed-forward neural networks, a generator G(z; θg ) and a discriminator
D(x; θd ) engaging in an iterative two-player minimax game with the value function V (G, D):
min max V (G, D) = Ex∼Pr (x) [hD (D(x; θd ))] + Ez∼Pz (z) [hG (D(G(z; θg ); θd ))], (6)
G D
where G(z) represents a mapping from the input noise variable z ∼ Pz (z) to the data space
x ∼ Pr (x), while D(x) represents the probability that x came from the data rather than G. The
exact form of h(.) depends on the choice of loss function. In [35], hD (D(x)) = log D(x) whereas
hG (D(G(z))) = log (1 − D(G(z))). On the other hand, in the Wasserstein GAN proposed in [30],
z ∼ N (0, Id ) and d n. For example, when a GAN is trained on an image dataset, d can be 100,
while n = 64 × 64 × 3 = 12288 (where 64 represents the image height and width in pixels and
3 represents the RGB color triplet). G is said to implicitly learn the distribution Pg (stored in its
Since the seminal paper [35], numerous variants of GAN have been published, differing in the
architecture and/or training procedure of G and D or the loss function used for penalizing the output
of D [36], [30]. However, GANs are known to be difficult to train, one of the reasons being that they
are subject to mode collapse. That is, they learn to characterize only a few modes of the distribution
[32]. The objective of training a GAN is that by varying the weights θg and θd of G(z; θg ) and
10
D(z; θd ), we want Pg → Pr . In [30], the Wasserstein-1 (EM) distance is shown to be much weaker
probability distributions converge under EM but not KL or JS. Using the continuous and differentiable
EM distance as the loss function for the output of D during training with weights clipping eliminates
careful balance in training of D and G, and design of NN architecture. It also drastically reduces
mode collapse since we can train D to optimality. Hence in this paper, we employ the Wasserstein
GAN [30] for learning the spatial channel distribution. An outline of the procedure for training a
Wasserstein GAN in the context of spatial channel matrix generation is given in Alg. 1 (adapted from
Algorithm 1: Minibatch stochastic gradient descent training of Wasserstein GANs for spatial
channel matrix generation with nd = 5 and c = 0.01
D should output 1 for a true channel realization x ∼ Pr (x) and 0 for a generated fake
channel realization G(z) ∼ Pg when z is sampled from Pz
for number of training iterations do
for nd iterations do
• Sample minibatch of m noise samples {z1 , ..., zm } ∼ Pz . Update D by
ascending its stochastic gradient
m
1 X
∇θd −D(G(zi )))
m i=1
• Sample minibatch of m channel realizations {x1 , ..., xm } ∼ Pr . Update D by
ascending its stochastic gradient
m
1 X
∇ θd D(xi )
m i=1
• θd = clip(θd , −c, c)
end
Sample minibatch of m noise samples {z1 , ..., zm } ∼ Pz . Update G by
descending its stochastic gradient
m
1 X
∇θg −D(G(zi )))
m i=1
end
1
A set of probability distributions Pn is said to converge to P∞ under a distance metric ρ if ρ(Pn , P∞ ) → 0 as n → ∞. By “weaker”,
1
we mean that the set of convergent sequences under EM is a superset of the sequences convergent under KL or JS The original
paper [30] refers to the discriminator as critic and uses ncritic = 5 which we refer to as nd
11
gradient descent based approach for compressed sensing using generative networks was proposed
in [29] to find the low dimensional representation z ∗ of the given input image x∗ such that the
reconstructed image G(z ∗ ) has small measurement error ||y − AG(z)||22 . While this is a non-convex
objective to optimize (since G(z) is a non-convex function of z), gradient descent was found
empirically to work well. To reconstruct the image, [29] solves the following optimization problem:
where y is the vector of received samples, G is a generative model, A is a measurement matrix, and
f is a loss function. For example, we could have f (y, AG(z)) = ||y − AG(z)||22 . Here, we minimize
the loss function over the input variable to the generator z. The reconstructed image is then G(z ∗ ).
As long as gradient descent finds a good approximate solution to (7), [29] gives a theoretical proof
to show that G(z ∗ ) will be almost as close to the true x∗ as the closest possible point in the range
To adapt the framework presented in [29] for channel estimation, we first train a Wasserstein GAN
[30] using a set of realistic channel realizations H (details of channel parameters presented in Section
IV) as defined in (1). We then extract the trained generator G. The trained generator, having implicitly
learned the underlying probability distribution of the channel matrices, will output channel realizations
G(z) for a given L2 bounded input vector z. In the testing phase, we will be given the noisy pilot
measurements y as defined in (3). We consider two possible cases: when the measurements are full
resolution and when they are one-bit quantized. For each case, we have heuristically developed loss
functions, that define the optimization problem to be solved at every coherence interval using gradient
descent. An illustration of the framework is shown in Fig 2 and the approach is summarized in Alg.
Full Resolution Channel Estimation: Replacing the sensing matrix A by PT ⊗ INr as derived in
(3), and imposing an L2 bound on z via regularization, we attempt to solve the following non-convex
optimization problem:
where d is the dimension of the input vector to the GAN and λreg serves as a regularization parameter.
The reconstructed channel estimate is then simply G(z ∗ ). Note that the entries in the training
precoder P were chosen i.i.d. from QPSK symbols. As a consequence, all the entries of the matrix
13
A = PT ⊗ INr are bounded (being either 0 or QPSK symbols) with mean 0, and from Hoeffding’s
3
Lemma applied separately to the real and imaginary parts, it follows that each entry of A will be
sub-Gaussian.
Quantized Channel Estimation: We now consider the case where the received signal is 1-bit
quantized. As a result, MIMO channel estimation even in the noiseless setting with sufficient pilot
symbols is an under-determined problem. In [37], they exploit the low-rank nature of mmWave
channels (due to clustering in the propagation environment) to constrain the space of channel estimates
to matrices H with low nuclear norm ||H||∗ (a relaxation of the low-rank constraint). In [38], the
authors solve the same optimization problem as (7) with the measurements y being one-bit quantized,
and under certain assumptions on the measurement matrix (A in (7)) and the architecture of the GAN,
design a custom loss function to solve for z ∗ . We draw inspiration from the approach taken in [37]
and [38] to design the following non-convex optimization problem for recovery in one-bit setting:
Np Nr
X
∗
z = arg min − λreg Q1 (yi )h(PT ⊗ INr )i , G(z)is + ||G(z)||∗ . (9)
z∈Rd i=1
This heuristically designed loss function attempts to minimize the nuclear norm ||G(z)||∗ of the
output of the generator G(z) while maximizing the correlation between Q1 (y) (which is ±1) and
h(PT ⊗INr ), G(z)i. The summation in (9) should be interpreted as the sum over the real and imaginary
parts, separately,
Np Nr Np Nr
X X
T
Q1 (yi,real )h(P ⊗ INr )i,real , G(z)real is + Q1 (yi,imag )h(PT ⊗ INr )i,imag , G(z)imag is (10)
i=1 i=1
Having recovered the channel estimate G(z ∗ ) from the compressed pilot measurements at the UE,
we now use this channel estimate to design the optimum RF and baseband precoder FRF and FBB .
The optimal latent input vector z ∗ of the generator G provides a compressed representation of the
3
Hoeffding’s Lemma states that for any random variable X with E[X] = 0 such that a ≤ X ≤ b w.p. 1, for all s ∈ R,
E[exp(sX)] ≤ exp(s2 (b − a)2 /8). Hence X is sub-Gaussian with variance proxy (b − a)2 /4.
14
channel. If we could convey the weights and architecture of the generator from the UE to the BS
during the initial access phase, then in subsequent data transmissions, the CSI overhead would be
considerably reduced. At every coherence time, we would simply feedback z ∗ to the BS and use
G(z ∗ ) as the channel estimate to design the precoder matrices FRF and FBB .
A. Data Generation
Channel realizations have been generated using the 5G Toolbox in MATLAB in accordance with
the 3GPP specifications TR 38.9014 . The channel simulation parameters are listed in Table I. In
order to generate structure in the channel realizations, some degree of correlation is required between
neighbouring antennas at the BS and the UE. To generate this correlation, the antenna element spacing
4
https://www.etsi.org/deliver/etsi tr/138900 138999/138901/14.00.00 60/tr 138901v140000p.pdf
15
in the uniform linear arrays (ULA) at the BS and UE were assumed to be λ/10. This reduced antenna
spacing is a crucial assumption, and we will justify its requirement in Section V-D. Each channel
realization generated in MATLAB was of dimension (Nf , 12, Nr , Nt ), the first and second dimension
being the number of subcarriers and number of OFDM symbols respectively. To focus on exploitation
of the spatial structure of the channel matrices, we simply extract the (Nr , Nt ) matrix corresponding
to the first subcarrier and first OFDM symbol for the purpose of these simulations.
B. Data Pre-processing
Note that G(z) has dimensions (Nt , Nr , 2), where the last entry corresponds to the real and
imaginary part. Thus, in the training dataset, H has to be split up into its real and imaginary part and
concatenated to obtain HG ∈ RNr ×Nt ×2 , while G(z) has to be reshaped as a complex-valued matrix
before being utilized for optimization in (8) or (9). Before using the data for training the GAN, we
where i ∈ [2Nt Nr ] and subscript i is used to denote the ith element in the array. While testing, we
do not have access to the element-wise mean and variance, hence we continue to use the training
16
We performed a simulation to ascertain the impact of this artifact, and found it was negligible. The
need for normalization arises from empirical evidence that the GAN is unable to learn mean-shifted
distributions [32].
C. NN Architecture of Generator
The GAN was implemented in Keras5 and PyTorch6 , with the basic implementation given online7 .
The generator and discriminator employed in the Wasserstein GAN were Deep Convolutional NNs.
While the discriminator architecture was adopted from [30], the generator was fine-tuned to improve
its ability to learn the underlying probability distribution and its architecture is described next.
The generator G takes an input z ∈ Rd , passes it through a dense layer with output size 128Nt Nr /16,
and reshapes it to an output size of (Nt /4, Nr /4, 128). This latent representation is then passed through
k = 2 layers, each consisting of the following units: upsampling, 2D Convolution with a kernel
size of 4 and Batch Normalization. At each stage, 2 × 2 upsampling is performed, i.e. the input is
reshaped from (Nt /n, Nr /n, 128) to (2Nt /n, 2Nr /n, 128) by replicating the corresponding values.
The performance of the generator is sensitive to this choice of sampling factor, with oversampling
of 4 and above preventing the generator from learning the channel distribution. Similarly, a kernel
size of 4 corresponds to using a 4 × 4 filter in the first two dimensions to replace each value by
a weighted average of the neighboring values that are within a 4 × 4 square surrounding it. Both
upsampling and 2D convolution thus model the local correlations in a spatial channel matrix, with
larger upsampling and size of kernel filter corresponding to a greater estimated spatial correlation. It
is finally passed through a 2D Convolutional layer with a kernel size of 4 and linear activation to
The training and test parameters for the Wasserstein GAN are specified in Table II. The generator
thus obtained is utilized in the GCE, to find the optimal z ∗ for each channel realization in the test
dataset. To minimize the loss function in (8) or (9), as the case may be, we use two approaches.
A derivative-free optimization procedure known as Powell’s conjugate direction method [39], with
a relative error tolerance of = 10−5 was employed in minimizing (8) and (9) for the generative
model trained in Keras, since a trained Keras model does not provide for differentiation of the loss
function in (8) with respect to the input vector z. However, as explained in [40], PyTorch allows
automatic differentiation and hence an Adam [41] optimizer with a learning rate of η = 10−2 and
iteration count of 100 is utilized in minimizing (8), for the generative model trained in PyTorch.
In this subsection, we describe the baselines used for assessing the performance of GCE. Since
we consider the narrow-band clustered channel model, we can use the virtual channel model [42] to
obtain a sparse representation of the channel matrix in the DFT basis. More specifically, assuming
uniform spaced linear arrays at the transmitter and receiver, the array response matrices are given by
the unitary DFT matrices AT ∈ CNt ×Nt and AR ∈ CNr ×Nr . Then we can represent H in terms of a
8
https://www.cs.toronto.edu/ tijmen/csc321/slides/lecture slides lec6.pdf
18
y = ((AH T
T P ) ⊗ AR )Hv s + n. (16)
A variety of Matching Pursuit (MP) and Approximate Message Passing (AMP) algorithms have been
i) Orthogonal Matching Pursuit (OMP): We directly solve (17) using OMP, as described in [9].
The stopping criterion for OMP is based on the power of the residual error. We stop when the energy
ii) Lasso Baseline: Consider the L1 convex relaxation of (17), and use Basis Pursuit Denoising to
However all the norms and matrices involved are complex valued. Hence, an L1 norm minimization
problem gets converted into a second order conic programming (SOCP) problem [43], and can be
iii) EM-GM-AMP: Approximate Message Passing algorithms such as EM-GM-AMP [11] are well-
established Bayesian techniques for sparse signal recovery from noisy compressive linear measurements
that are known to hold for a large class of sensing matrices. Using the EM-GM-AMP implementation
described in [11], we input y and Asp and recover the channel estimate Hv , which is then used to
9
The maximum number of iterations are set to be 100. If OMP is allowed to run further, it fits to the noise at low SNR and the
NMSE increases.
19
recover H using the array response matrices AT and AR . It is to be noted that assuming an antenna
spacing of λ/10, with the columns of AT as well as AR being independent, leads to the entries of Hv
being correlated. This correlation is however not exploited by EM-GM-AMP. Improved benchmarking
comparisons with algorithms such as EMturboGAMP [45] that attempt to exploit structured sparsity
These are the three sparse signal recovery baselines - that each require knowledge of the sparsifying
basis - we use to assess the performance of the proposed GCE. It should be noted that the beamspace
V. R ESULTS
The first experiment performed is to determine the optimal latent dimension d of the input z to the
generator. Ideally, CS techniques would determine d in the absence of noise, hence we fix the SNR
at a high value of 40 dB, and evaluate the NMSE as a function of the number of pilot measurements
−11 CS-GAN d = 25
CS-GAN d = 35
−12 CS-GAN d = 45
−13
NMSE(dB)
−14
−15
−16
−17
Fig. 3. NMSE vs. α = Np /Nt for varying dimension d of the input z to the generator G
20
From Fig 3, we can see that d = 35 appears sufficient with Np /Nt = 0.4. Increasing the number
of pilot measurements Np /Nt beyond 0.4 does not have any measurable impact on the NMSE. This
indicates that any more measurements would not improve the accuracy of the channel prediction. More
importantly, it highlights that there exists a compressed representation for the channel in an unknown
basis, but using the optimal latent input vector z ∗ defined in (8), we can recover the channel prediction
perfectly without knowing, for example, that the channel is sparse in the DFT basis. We obtain a
nearly 50x compressed representation of the channel, with under 40 parameters needed to represent a
16 × 64 channel matrix realization( = 2048 real values). While current mmWave channel estimation
techniques focus on the optimal design of training precoders and combiners under the assumption of
either virtual channel models [42], UIU models [46] among others, the GCE minimizes the need
for their optimal design and provides a model-free approach for representing inherently sparse or
structured channels. This may prove valuable for future deployments at progressively higher carrier
frequencies, where these models may not hold. With d = 35, we now vary the SNR, and observe the
NMSE vs. SNR for varying α = Np /Nt in the case of full-resolution and one-bit quantized pilot
measurements. The OMP, Lasso and EM-GM-AMP baselines are also plotted.
As shown in Fig 4, the GCE offers large improvement in NMSE, of at least 5 dB at an SNR of
-10 dB and up to 8 dB at an SNR of 15 dB for α = 0.2 over the EM-GM-AMP baseline. The GCE’s
performance also does not change significantly as α increases from 0.4 to 0.75, indicating that the
prior learnt by the generator G is informative enough to require only 40% of the total number of
pilots that would have been needed by a well-posed channel estimation problem to reconstruct the
channel. Moreover, the improvement in NMSE offered by the GCE decreases as α increases from
0.2 to 1, with the gap between EM-GM-AMP and GCE being reduced to 2 dB at an SNR of 15 dB
and α = 0.1. However, at low and medium SNR, the GCE continues to outperform all CS based
21
methods significantly.
NMSE(dB)
0
5
5
10
10
15
15
10 5 0 5 10 15 10 5 0 5 10 15
SNR(dB) SNR(dB)
10
15 GCE = 0.75 GCE = 1
Lasso = 0.75 OMP = 1
10 OMP = 0.75 5 EM-GM-AMP = 1
EM-GM-AMP = 0.75
5 0
NMSE(dB)
NMSE(dB)
0
5
5
10
10
15 15
10 5 0 5 10 15 10 5 0 5 10 15
SNR(dB) SNR(dB)
Fig. 4. NMSE vs. SNR for various values of α = Np /Nt . The α values are [0.2, 0.4, 0.75, 1]. The Lasso curve is omitted for α = 1
since CVXPY [44] takes too long to converge due to the large number of optimization variables.
The NMSE for the case of 1-bit quantized pilot measurements is defined slightly differently, since
in one-bit measurements, we cannot determine the relative scaling factor for the reconstructed channel
matrices.
" #
||H − κĤ||22
NMSE = E , (19)
||H||22
where κ = argmin||H − κĤ||22 for a given H and Ĥ. Note that though this may seem genie-aided,
precoder optimization that finally determines the achievable rate is not affected by this scaling factor.
The dependence of NMSE on SNR for one-bit measurements with varying number of pilots is shown
22
in Fig. 5, and contrasted with the performance of EM-GM-AMP on the same measurements. As one
can clearly see, the GCE brings about an immense improvement in NMSE, and this can be attributed
To validate the improvement in channel estimate quality postulated in Section V-B, we calculate
the spectral efficiency obtained using hybrid precoding in the data transmission phase. We assume
Ns = min(Nt , Nr ) = 16 and optimal unconstrained combiners are employed at the UE. The RF
and baseband precoders FRF and FBB are computed as explained in Section II using OMP. Three
different channel estimates are used for designing these precoders: the estimate returned by the GCE,
the AMP algorithm EM-GM-AMP [11] and the ground truth channel realization (for computing the
perfect CSI curve). The spectral efficiency vs. SNR plots are shown in Fig. 6 for varying α. As is
evident, the GCE channel estimate enables design of precoders that support higher capacity data
9
Perfect CSI
8 GCE = 0.75
7 GCE = 0.4
Spectral Efficiency(b/s/Hz)
GCE = 0.2
6 EM-GM-AMP = 0.75
EM-GM-AMP = 0.4
5 EM-GM-AMP = 0.2
4
3
2
1
0
10 5 0 5 10 15
SNR(in dB)
Fig. 6. Spectral Efficiency v/s SNR as a function of α = Np /Nt with one-bit quantization using OMP-based precoding
The benefit obtained from the GCE is clear in the low pilot density and low SNR regime. As the
number of pilot symbols increases, the performance of standard CS-based methods gets closer to the
GCE, and would be similar to that of the GCE for Np ≥ Nt . At low SNR, the pilot measurements
received are of very poor quality, hence CS-based methods do not perform well, but the GCE utilizes
its prior to obtain performance that cannot be achieved by the CS-based methods. This is clearly
evident in the one-bit quantized case (Fig. 5), where the GCE curves are roughly parallel to the
EM-GM-AMP curves with the constant gap being the generative prior gain. It can be expected that
as the number of antennas packed onto a planar array increases with the move toward THz carrier
frequencies, sending an adequate number of pilots would lead to an unsustainable overhead, and
recovering the channel estimate from an insufficient number of pilots will become critical. While the
GCE outperforms the three CS-based methods, it is important to note the following caveats:
High Spatial Correlation: GCE required a reduced antenna spacing of λ/10, rather than λ/2, to
successfully learn the channel distribution. As shown in Fig. 7(a), the singular value profile of a
24
λ/2 channel realization has a higher effective rank than a λ/10 realization, due to its lower spatial
correlation. As a consequence, the generator of a GAN trained on λ/2 channel realizations was unable
to learn the underlying probability distribution and the resulting performance of GCE was poor as
shown on the right in Fig. 7(b) for α = 0.75. Since a GAN was originally designed to learn the prior
for image datasets, which have extremely high spatial correlation, the GCE was also found to work
due to their tiny size, it is expected that such singular value profiles will become more commonly
observed and only the maximum eigenvector will be needed to acheive capacity in this regime.
A recent paper [47] shows how metamaterial antennas can be used for wireless communications,
including LTE and WiFi. Conventional antennas that are very small compared to the wavelength
reflect most of the signal back to the source. However, a metamaterial antenna steps-up the antenna’s
radiated power and behaves as if it were much larger than its actual size, because its novel structure
stores and re-radiates energy, which could lead to the deployment of sub-wavelength antennas.
ULA Spacing = /2
101 ULA Spacing = /10 0.0
Magnitude of Singular Values
100 2.5
5.0
10 1
NMSE(dB)
(a) Singular values of channel realizations in descending order (b) NMSE v/s SNR for the two datasets of channel realizations
of magnitude with antenna spacing λ/2 and λ/10.
Fig. 7. The left figure shows the singular value profile of a channel realization with an antenna spacing of λ/10 and λ/2. The higher
correlation in the λ/10 realization enables the generator to learn a rich prior and the GCE to obtain a significantly lower NMSE as
shown in the figure on the right.
Rich Generative Prior: The weights θg of the generator G(z; θg ) encode a probability distribution
over the space of permissible spatial channel matrices, such that by inputting z, we can draw samples
from that distribution. Conventional CS techniques have no such prior knowledge of the distribution
25
of the channel matrices, however they capitalize on the sparsity of the beamspace representation of
the channel, which the GCE does not utilize. The results seem to indicate that the generative prior
is much more informative than the sparsifying basis, but we have no means of quantifying this yet.
Recent efforts in theoretical machine learning [48] have attempted to quantify the information in the
weights of a NN in terms of the impact that perturbing a weight has on the cross-entropy loss. Such
work could prove very useful in quantifying the information gain of a generative prior.
Training on Simulated Channel Realizations: We have currently trained a GAN using simulated
channel realizations, since obtaining realistic channel data has not proven possible, even with our
industry partners. One can only hope to recover the channel estimate based on pilot measurements
from current transceiver chips. It remains to be seen if the GAN can succeed in learning the channel
distribution even from these noisy channel estimates. The original GAN proposed in [35] is known to
learn discriminators with poor generalization capabilities, and many recent works [30], [32], [49] have
taken different approaches to justifying design of custom objective functions for the discriminator that
would help the generator to better approximate the target distribution, and improve the generalization
E. Timing Analysis
Using the PyTorch based generative model, optimization of (8) involves only performing gradient
descent with respect to z ∈ Rd , with d = 35 in our case. Hence one would expect each iteration to be
comparision of its execution time per iteration as compared to the CS baselines and the results are
tabulated in Table III. The number of iterations required to achieve the NMSE results in Fig. 4 for
each method are also given in Table III. The evaluation of the first three methods was performed
on an Intel i9-8950HK CPU @ 2.90GHz. The results for GCE are given both when performed on
the Intel i9-8950HK CPU without acceleration as well as when accelerated using a Nvidia GeForce
26
GTX 2070 GPU. As expected, a GPU does speed up backpropogation through the NN immensely as
TABLE III: Comparison of execution time per iteration (in milliseconds) for OMP, Lasso, EM-GM-
AMP and GCE on a single channel realization at an SNR of -10 dB.
The most important finding from Table III is that the execution time of GCE is not noticeably
affected by the increase in the number of pilot symbols, while the execution time of CS baselines
where each row of the matrix ∇z G(z) is d = 35 dimensional. This involves only direct matrix
multiplications of A. On the other hand, for OMP, one of the steps involves inverting columns of
Asp having maximum inner product with y, whose complexity scales as O(Npm ) with 2 ≤ m < 3.
Similarly, the Lasso and EM-GM-AMP optimization problems have complexity scaling with Npm .
Moreover, as explained in Section IV-E, Lasso involves solving an SOCP, hence takes much longer
than the other algorithms. The impact of Np on the execution time of GCE will only be seen at
much higher values of Np , unlike the CS based algorithms for whom the impact of increasing Np
is immediately apparent10 . Note however that the complexity of computing ∇z G(z) is quite high
owing to the large number of weights θg in the trained generator G, hence the execution time of
VI. C ONCLUSION
networks that achieves a significant performance gain over prior techniques for sparse signal recovery,
when applied to CDL channel models. Notable aspects of this approach are that it does not require
knowledge of the sparsifying basis of the channel and immensely reduces the number of pilots required
to achieve the same NMSE as Lasso/OMP/EM-GM-AMP channel estimation, even in the case of
of (8) requiring only matrix multiplications, its execution time is approximately independent of the
VII. ACKNOWLEDGEMENTS
The authors would like to thank Nitin Myers for discussions on low resolution quantization and
Shilpa Talwar, Nageen Himayat, Ariela Zeira at Intel for their invaluable support and technical advice
and feedback.
R EFERENCES
[1] A. Doshi, E. Balevi, and J. G. Andrews, “Compressed representation of high dimensional channels using deep generative networks,”
in Proc. IEEE Signal Proc. Adv. in Wireless Comm. (SPAWC), May 2020.
[2] T. S. Rappaport, Y. Xing, O. Kanhere, S. Ju, A. Madanayake, S. Mandal, A. Alkhateeb, and G. C. Trichopoulos, “Wireless
communications and applications above 100 GHz: Opportunities and challenges for 6G and beyond,” IEEE Access, vol. 7, pp.
[3] H. Elayan, O. Amin, R. M. Shubair, and M.-S. Alouini, “Terahertz communication: The opportunities of wireless technology
beyond 5G,” in IEEE Intl. Conf. on Advanced Comm. Technologies and Networking (CommNet), Apr. 2018, pp. 1–5.
[4] E. Björnson, J. Hoydis, L. Sanguinetti et al., ”Massive MIMO networks: Spectral, energy, and hardware efficiency”. Foundations
[5] P. A. Eliasi, S. Rangan, and T. S. Rappaport, “Low-rank spatial channel estimation for millimeter wave cellular systems,” IEEE
Trans. on Wireless Communications, vol. 16, no. 5, pp. 2748–2759, Apr. 2017.
[6] W. U. Bajwa, J. Haupt, A. M. Sayeed, and R. Nowak, “Compressed channel sensing: A new approach to estimating sparse
multipath channels,” Proc. IEEE, vol. 98, no. 6, pp. 1058–1076, Jun. 2010.
28
[7] W. U. Bajwa, A. Sayeed, and R. Nowak, “Sparse multipath channels: Modeling and estimation,” in IEEE 13th Digital Signal
Processing Workshop and 5th IEEE Signal Processing Education Workshop, 2009, pp. 320–325.
[8] A. Alkhateeb, O. El Ayach, G. Leus, and R. W. Heath, “Channel estimation and hybrid precoding for millimeter wave cellular
systems,” IEEE J. Sel. Topics Sig. Process., vol. 8, no. 5, pp. 831–846, Oct. 2014.
[9] R. Méndez-Rial, C. Rusu, N. González-Prelcic, A. Alkhateeb, and R. W. Heath, “Hybrid MIMO architectures for millimeter
wave communications: Phase shifters or switches?” IEEE Access, vol. 4, pp. 247–267, Jan. 2016.
[10] S. Rangan, “Generalized approximate message passing for estimation with random linear mixing,” in IEEE Intl. Symposium on
[11] J. P. Vila and P. Schniter, “Expectation-maximization Gaussian-mixture approximate message passing,” IEEE Trans. on Signal
[12] S. Rangan, P. Schniter, and A. K. Fletcher, “Vector approximate message passing,” IEEE Trans. on Info. Theory, vol. 65, no. 10,
[13] J. Mo, P. Schniter, N. G. Prelcic, and R. W. Heath, “Channel estimation in millimeter wave MIMO systems with one-bit
quantization,” in 48th Asilomar Conference on Signals, Systems and Computers, Nov. 2014, pp. 957–961.
[14] J. Mo, P. Schniter, and R. W. Heath, “Channel estimation in broadband millimeter wave MIMO systems with few-bit ADCs,”
IEEE Trans. on Signal Processing, vol. 66, no. 5, pp. 1141–1154, Dec. 2017.
[15] X. Lin, S. Wu, C. Jiang, L. Kuang, J. Yan, and L. Hanzo, “Estimation of broadband multiuser millimeter wave massive
MIMO-OFDM channels by exploiting their sparse structure,” IEEE Transactions on Wireless Communications, vol. 17, no. 6, pp.
[16] D. Katselis, C. R. Rojas, M. Bengtsson, and H. Hjalmarsson, “Frequency smoothing gains in preamble-based channel estimation
for multicarrier systems,” Signal Processing, vol. 93, no. 9, pp. 2777–2782, Sep. 2013.
[17] H. Ye, G. Y. Li, and B.-H. Juang, “Power of deep learning for channel estimation and signal detection in OFDM systems,” IEEE
[18] H. He, C.-K.Wen, S. Jin, and G. Y. Li, “Model-driven deep learning for joint MIMO channel estimation and signal detection,”
[19] Y. Yang, F. Gao, X. Ma, and S. Zhang, “Deep learning-based channel estimation for doubly selective fading channels,” IEEE
[20] H. He, C.-K. Wen, S. Jin, and G. Y. Li, “Deep learning-based channel estimation for beamspace mmWave massive MIMO
systems,” IEEE Wireless Communications Letters, vol. 7, no. 5, pp. 852–855, Oct. 2018.
[21] X. Ru, L. Wei, and Y. Xu, “Model-driven channel estimation for OFDM systems based on image super-resolution network,”
[22] P. Dong, H. Zhang, G. Y. Li, I. S. Gaspar, and N. NaderiAlizadeh, “Deep CNN-based channel estimation for mmWave Massive
MIMO systems,” IEEE J. Sel. Topics Sig. Process., vol. 13, no. 5, pp. 989–1000, Jul. 2019.
29
[23] S. Gao, P. Dong, Z. Pan, and G. Y. Li, “Deep learning based channel estimation for massive MIMO with mixed-resolution
[24] E. Balevi and J. G. Andrews, “Deep learning-based channel estimation for high-dimensional signals,” arXiv preprint
arXiv:1904.09346, 2019.
[25] E. Balevi, A. Doshi, and J. G. Andrews, “Massive MIMO Channel Estimation with an Untrained Deep Neural Network,” IEEE
Trans. on Wireless Communications, vol. 19, no. 3, pp. 2079–2090, Jan. 2020.
[26] R. Heckel and P. Hand, “Deep Decoder: Concise Image Representations from Untrained Non-convolutional Networks,” in Proc.
[27] C.-K. Wen, W.-T. Shih, and S. Jin, “Deep learning for massive MIMO CSI feedback,” IEEE Wireless Communications Letters,
[28] Z. Lu, J. Wang, and J. Song, “Multi-resolution CSI feedback with deep learning in Massive MIMO system,” arXiv preprint
[29] A. Bora, A. Jalal, E. Price, and A. G. Dimakis, “Compressed sensing using generative models,” in Intl. Conf. on Machine
[30] M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein generative adversarial networks,” in Intl. Conf. on Machine Learning
[31] O. El Ayach, S. Rajagopal, S. Abu-Surra, Z. Pi, and R. W. Heath, “Spatially sparse precoding in millimeter wave MIMO systems,”
IEEE Trans. on Wireless Communications, vol. 13, no. 3, pp. 1499–1513, Jan. 2014.
[32] A. Srivastava, L. Valkov, C. Russell, M. U. Gutmann, and C. Sutton, “Veegan: Reducing mode collapse in GANs using implicit
variational learning,” in Adv. in Neural Info. Process. Systems, Dec. 2017, pp. 3308–3318.
[33] T. J. OShea, T. Roy, and N. West, “Approximating the void: Learning stochastic channel models from observation with variational
generative adversarial networks,” in IEEE Intl. Conf. on Computing, Net. and Comm., Apr. 2019, pp. 681–686.
[34] H. Ye, G. Y. Li, B.-H. F. Juang, and K. Sivanesan, “Channel agnostic end-to-end learning based communication systems with
[35] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial
nets,” in Adv. in Neural Info. Process. Systems, Dec. 2014, pp. 2672–2680.
[36] A. Radford, L. Metz, and S. Chintala, “Unsupervised representation learning with deep convolutional generative adversarial
[37] N. J. Myers, K. N. Tran, and R. W. Heath Jr, “Low-rank mmWave MIMO channel estimation in one-bit receivers,” arXiv preprint
[38] S. Qiu, X. Wei, and Z. Qiu, “Robust One-Bit Recovery via ReLU Generative Networks: Improved Statistical Rates and Global
[39] M. J. Powell, “An efficient method for finding the minimum of a function of several variables without calculating derivatives,”
[40] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic
differentiation in pytorch,” Neural Info. Process. Systems (NIPS) Workshop Autodiff, Oct. 2017.
[41] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in Proc. ICLR, Dec. 2014.
[42] A. M. Sayeed, “Deconstructing multiantenna fading channels,” IEEE Trans. Signal Process., vol. 50, no. 10, pp. 2653–2579, Oct.
2002.
[43] S. Winter, H. Sawada, and S. Makino, “On real and complex valued `1 -norm minimization for overcomplete blind source
separation,” in IEEE Wkshp on Appl. of Sig. Process. to Audio and Acoustics, Nov. 2005, pp. 86–89.
[44] S. Diamond and S. Boyd, “CVXPY: A Python-embedded modeling language for convex optimization,” Journal of Machine
Learning Research, vol. 17, no. 83, pp. 1–5, Apr. 2016.
[45] P. Schniter, “Turbo reconstruction of structured sparse signals,” in 2010 44th Annual Conference on Information Sciences and
[46] A. M. Tulino, A. Lozano, and S. Verdú, “Capacity-achieving input covariance for single-user multi-antenna channels,” IEEE
[47] M. M. Hasan, M. R. I. Faruque, and M. T. Islam, “Dual band metamaterial antenna for LTE/Bluetooth/WiMAX system,” Scientific
[48] A. Achille and S. Soatto, “Where is the information in a deep neural network?” arXiv preprint arXiv:1905.12213, May 2019.
[49] H. Thanh-Tung, T. Tran, and S. Venkatesh, “Improving generalization and stability of generative adversarial networks,” in Proc.